Mensing.AI


Learnings & Musings on AI, ML, Data Science & Python

Data Sets Get LawyeredπŸ‘©β€βš–οΈ

The LawyerBot 3000 might soon be a reality thanks to Harvard. They have digitized over 6 million cases to aid in the development of AI systems for the legal sector. So fire up your NLP and get ready to object! βš–οΈ Src: Caselaw Access Project


How Many Words Is This Dataset Worth? 🀯

Google recently released version 4 of the Open Images dataset and it’s quite large. We’re talking a nine followed by six zeroes large and all are labeled and content is boxed and labeled. Happy training! πŸ“¦ Src: Google


Fast.DataSets πŸ”£

fast.ai and AWS have teamed up to make some of the most popular deep learning datasets “available in a single place, using standard formats, on reliable and fast infrastructure.” Woo! πŸ™Œ MNIST, CIFAR, IMDb, Wikitext, and more! Check β€˜em out. Src: fast.ai


Facebook Can Read Photos πŸ–ΌοΈ

Big Blue has rolled out a tool called Rosetta that can scan photos for text, extract the text it finds, and then “understand” that text. πŸ‘οΈβ€πŸ—¨οΈ This is huge as it means the platform can now increase accessibility by reading photos, it can pull out information from photos of menus and street signs, and it can monitor memes and images … Read More


ELMo Really Does Know His Words πŸ‘Ή

I’m super interested in the world of NLP (natural language processing), so the news that performance increased dramatically with ELMo piqued my interest. πŸ’‘ The biggest benefit in my eyes is that this method doesn’t require labeled data, which means the world of written word is our oyster. 🐚 Yeah, yeah, word embeddings don’t require labeled data either. ELMo can … Read More


(Compute) Size Doesn’t Matter πŸ“

Fast.ai was recently part of a team that set some new speed benchmarks on the ImageNet image recognition data set. Why is this noteworthy? Because they did it on an AWS instance that cost $40 total. πŸ… We entered this competition because we wanted to show that you don’t have to have huge resources to be at the cutting edge … Read More


The Deciding Tree 🌳

This is a really great description of decision trees with some lovely visuals. It also continues a good overview of overfitting. πŸ‘Œ Decision trees might not seem as sexy as other algorithmic approaches, but it’s hard to argue with the results. It also strikes me how similar this process seems to the way humans approach a lot of experience-based decision … Read More


When Gradients Explode (or Vanish) πŸ’₯

This is a nice quick read on how to combat exploding or vanishing gradients, a problem that wreak havoc on your deep learning model. πŸ‘Ή My TL;DR: Exploding gradient? Use gradient clipping. It sets a ceiling on your gradient but keeps the direction. βœ‚οΈ Vanishing gradient? If you’re using an RNN, use an LSTM. βœ”οΈ Src: Learn.Love.AI.


Dataset Database πŸ—„

What does ML want? Data! When does it want it? All the time! But specifically, whenever you are going to train, test, and deploy a model. Where do you get this data? I’m glad you asked! πŸ˜ƒ Here is a collection of datasets I’ve come across. I’ll update it as I find more. βž• Computer Vision Open Images V4 from … Read More


Unbiased Faces πŸ‘ΆπŸ»πŸ‘©πŸ½πŸ‘΄πŸΏ

IBM will be releasing a data set of faces across all ethnicities, genders, and ages to both avoid bias in future facial recognition systems and test existing systems for bias. Simply put, this is awesome. πŸ™Œ It’s also interesting to see how ethics, fairness, and openness are being used as positive differentiators by major competitors in this new tech race. … Read More