You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing
It is a truth universally acknowledged that sentiment analysis is super fun, and Pride and Prejudice is probably my very favorite book in all of literature, so let’s do some Jane Austen natural language processing. Project Gutenberg makes e-texts available for many, many books, including Pride and Prejudice which... Read more
Hello all and welcome to the second of the series – NLP with NLTK. The first of the series can be found here, incase you have missed. In this article we will talk about basic NLP concepts and use NLTK to implement the concepts. Contents: Corpus Tokenization/Segmentation Frequency Distribution... Read more
On word embeddings – Part 2: Approximating the Softmax
Table of contents: Softmax-based Approaches Hierarchical Softmax Differentiated Softmax CNN-Softmax Sampling-based Approaches Importance Sampling Adaptive Importance Sampling Target Sampling Noise Contrastive Estimation Negative Sampling Self-Normalisation Infrequent Normalisation Other Approaches Which Approach to Choose? Conclusion This is the second post in a series on word embeddings and representation learning. In... Read more
ftfy (fixes text for you) 4.4 and 5.0
ftfy is Luminoso’s open-source Unicode-fixing library for Python. Luminoso’s biggest open-source project is ConceptNet, but we also use this blog to provide updates on our other open-source projects. And among these projects, ftfy is certainly the most widely used. It solves a problem a lot of people have with... Read more
On word embeddings – Part 1
Table of contents: A brief history of word embeddings Word embedding models A note on language modelling Classic neural language model C&W model Word2Vec CBOW Skip-gram Unsupervisedly learned word embeddings have been exceptionally successful in many NLP tasks and are frequently seen as something akin to a silver bullet.... Read more
Implementing a CNN for Text Classification in Tensorflow
The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become... Read more
Attention and Memory in Deep Learning and NLP
A recent trend in Deep Learning are Attention Mechanisms. In an interview, Ilya Sutskever, now the research director of OpenAI, mentioned that Attention Mechanisms are one of the most exciting advancements, and that they are here to stay. That sounds exciting. But what are Attention Mechanisms? Attention Mechanisms in... Read more
Deutsch Credit Future Telling: part 2
To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there... Read more
Classification tasks in Data Science come frequently, but the hardest are those with unbalanced classes. From biology to finance, the real-life situations are numerous. Before balancing your errors, establishing a baseline with the most frequent occurrence can give you over 90% accuracy right off the bat.  The question of... Read more
The Sentiment Behind The Declaration of Independence
The American political season often conjures numerous references to the country’s origins from either side of the aisle. What better way to join in than by looking at the country’s birth using Data Science, the field that will dictate much of its future. I’ll do this by leveraging a... Read more