Training + Business. Get your 2-for-1 deal to ODSC
East & CxO Summit before it expires on Friday.

This deal has timed out, but the next deal might just around the corner, or find a way to contact us about writing a blog and we'll talk. See you at ODSC East!

Use code: BUSINESS for an extra 20% Off

NLP with NLTK – Part 1

NLP with NLTK –...

Introduction: The idea of using a structured programming language to interact with computers is being challenged by Natural Language Processing (NLP) and Natural Language Understanding methods. NLP holds great promises of making computer interfaces accessible to a wide range of audiences – as humans would be able to talk to computers in their own native […]

Processing the Language of Pitchfork Part 2: Word Count

Processing the Langu...

In the second part of this three-part ODSC series on analyzing Pitchfork album reviews, we’ll introduce the Natural Language Toolkit library to discover patterns, trends, and other interesting things hidden in the words of album reviews. For this article I found the most commonly used words and adjectives/adverbs in my collection of 17,000 reviews. I also […]

Processing The Language of Pitchfork Part 1

Processing The Langu... is the web’s premier site for music criticism and news. Their album reviews are famous for their overt detail, astute prose, and cutting wit. They are often credited for the popularity of indie music in the 00s and 10s and for “breaking” bands such as Animal Collective, Bon Iver, and Grizzly Bear. A good […]

The Sentiment Behind The Declaration of Independence

The Sentiment Behind...

The American political season often conjures numerous references to the country’s origins from either side of the aisle. What better way to join in than by looking at the country’s birth using Data Science, the field that will dictate much of its future. I’ll do this by leveraging a subset of Natural Language Processing (NLP) […]

Naive Bayes and Spam Detection

Naive Bayes and Spam...

In natural language processing, text classification techniques are used to assign a class to a given text.  For example, in spam detection, the classifiers decides an email belongs to a spam or non spam (ham) class. Deciding what the topic of a news article is, or whether a movie review is positive or negative, Authorship […]

Identifying Hate Speech

Identifying Hate Spe...

All the beauty of the internet age comes with its fair share of ugliness. Recently, a deluge of articles highlighting the dark side of Twitter has raised concerns for its future. As great as it is to engage with others on a variety of topics, as of late it’s the bad eggs that seem to […]

The Influence of Tongues

The Influence of Ton...

The Global Language Network is a project of the MIT Media Lab in collaboration with Aix-Marseille Université, Northeastern University, and Harvard University. Drawing from publicly available data sources (Twitter, books, and Wikipedia) the GLN project is an analysis of the global influence of languages. By looking at the structure of the networks connecting multilingual speakers […]

What Your Google Search History Says About You

What Your Google Sea...

By now we are all pretty much used to the fact that Google knows everything about us. What we do, where we go, our interests, etc. Ari Morcos set out to find what could be inferred from his own Googling by downloading two years worth of search data. This is a very nice exploration in Python […]

Evan Schnidman & Bill McMillian – “Hybrid Analytics: Sentiment from Communications”

Evan Schnidman &...

Abstract: Sentiment analysis has too often relied on traditional data science techniques to attempt to glean nuanced sentiment information from complex documents. Traditional methods allow us to get an overview of the forest of data that exists, but deep domain expertise is crucial to getting a more nuanced view of the trees. This talk examines […]