ftfy (fixes text for you) 4.4 and 5.0
ftfy is Luminoso’s open-source Unicode-fixing library for Python. Luminoso’s biggest open-source project is ConceptNet, but we also use this blog to provide updates on our other open-source projects. And among these projects, ftfy is certainly the most widely used. It solves a problem a lot of people have with... Read more
On word embeddings – Part 1
Table of contents: A brief history of word embeddings Word embedding models A note on language modelling Classic neural language model C&W model Word2Vec CBOW Skip-gram Unsupervisedly learned word embeddings have been exceptionally successful in many NLP tasks and are frequently seen as something akin to a silver bullet.... Read more
Implementing a CNN for Text Classification in Tensorflow
The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become... Read more
Attention and Memory in Deep Learning and NLP
A recent trend in Deep Learning are Attention Mechanisms. In an interview, Ilya Sutskever, now the research director of OpenAI, mentioned that Attention Mechanisms are one of the most exciting advancements, and that they are here to stay. That sounds exciting. But what are Attention Mechanisms? Attention Mechanisms in... Read more
Deutsch Credit Future Telling: part 2
To continue on this first path, it’s logical to proceed with hyperparameter tuning on the three algorithms previously mentioned in part 1. Here the Random Forest Classifier (R.F.C) pulls ahead with 77% accuracy while the other two are still around 75%. Where there were three on this road, there... Read more
Classification tasks in Data Science come frequently, but the hardest are those with unbalanced classes. From biology to finance, the real-life situations are numerous. Before balancing your errors, establishing a baseline with the most frequent occurrence can give you over 90% accuracy right off the bat.  The question of... Read more
The Sentiment Behind The Declaration of Independence
The American political season often conjures numerous references to the country’s origins from either side of the aisle. What better way to join in than by looking at the country’s birth using Data Science, the field that will dictate much of its future. I’ll do this by leveraging a... Read more
Google’s TensorFlow framework spread like wildfire upon its release. The slew of tutorials and extensions made an already robust ecosystem even more so. Recently, Google released one of their own extensions. It’s called SyntaxNet, a TensorFlow based syntactic parser for Natural Language Understanding. SyntaxNet uses neural networks to model... Read more
Identifying Hate Speech
All the beauty of the internet age comes with its fair share of ugliness. Recently, a deluge of articles highlighting the dark side of Twitter has raised concerns for its future. As great as it is to engage with others on a variety of topics, as of late it’s... Read more
Social media has fundamentally changed the way in which we interact with each other, and with the World Wide Web. Our web activities are now inherently social. We can keep in touch with close friends on facebook without ever needing to pick up a phone or get on a... Read more