In my last post, I did some natural language processing and sentiment analysis for Jane Austen’s most well-known novel, Pride and Prejudice. It was just so much fun that I wanted to extend some of that work and compare across her body of writing. I decided to make an... Read more
Seven Python Kernels from Kaggle You Need to See Right Now
The ability to post and share kernels is probably my favorite thing about Kaggle. Learning from other users’ kernels has often provided inspiration for a number of my own projects. I also appreciate the attention to detail and descriptions provided by some users in their code as well. This... Read more
Stupid word games
Today, Jeroen Ooms announced the appearance on CRAN of an R package for language detection, wrapping the “CLD2″ compact language detector.   Obviously, given a tool like that on a holiday long weekend, my first reaction was to try to confuse it. Two fun games to play with a language detector:... Read more
Machine Learning: An In-Depth Guide – Overview, Goals, Learning Types, and Algorithms
Articles Overview, goals, learning types, and algorithms Data selection, preparation, and modeling Model evaluation, validation, complexity, and improvement Model performance and error analysis Unsupervised learning, related fields, and machine learning in practice Introduction Welcome! This is the first article of a five-part series about machine learning. Machine learning is... Read more
 This blog post is on song lyric sentiment. Feel free to fork this code from GitHub. Sentiment Analysis is one of the techniques of NLP (Natural Language Processing). It is part of NLU (Natural Language Understanding). It allows us to classify the sentiment of a text, positive or negative,... Read more
Last Saturday, in the UEFA Champions League final (think of it as Europe’s Super Bowl), Spanish giants Real Madrid defeated their Italian counterparts Juventus FC 4-1. It was a thrilling match, that saw both sides staking an equal claim to winning the match in the first half, with Madrid eventually prevailing... Read more
Topic Modeling with LDA Introduction
Suppose you have the following set of sentences: I eat fish and vegetables. Fish are pets. My kitten eats fish. Latent Dirichlet allocation (LDA) is a technique that automatically discovers topics that these documents contain. Given the above sentences, LDA might classify the red words under the Topic F, which we... Read more
Deciphering the Neural Language Model
Recently, I have been working on the Neural Networks for Machine Learning course offered by Coursera and taught by Geoffrey Hinton. Overall, it is a nice course and provides an introduction to some of the modern topics in deep learning. However, there are instances where the student has to do... Read more
Why the most influential business AIs will look like spellcheckers (and a toy example of how to build one)
Forget voice-controlled assistants. At work, AIs will turn everybody into functional cyborgs through squishy red lines under everything you type. Let’s look at a toy example I just built (mostly to play with deep learning along the way). I chose as a data set Patrick Martinchek’s collection of Facebook... Read more
A survey of cross-lingual embedding models
In past blog posts, we discussed different models, objective functions, and hyperparameter choices that allow us to learn accurate word embeddings. However, these models are generally restricted to capture representations of words in the language they were trained on. The availability of resources, training data, and benchmarks in English... Read more