Processing the Language of Pitchfork Part 2: Word Count

Processing the Langu...

In the second part of this three-part ODSC series on analyzing Pitchfork album reviews, we’ll introduce the Natural Language Toolkit library to discover patterns, trends, and other interesting things hidden in the words of album reviews. For this article I found the most commonly used words and adjectives/adverbs in my collection of 17,000 reviews. I also […]

Processing The Language of Pitchfork Part 1

Processing The Langu...

Pitchfork.com is the web’s premier site for music criticism and news. Their album reviews are famous for their overt detail, astute prose, and cutting wit. They are often credited for the popularity of indie music in the 00s and 10s and for “breaking” bands such as Animal Collective, Bon Iver, and Grizzly Bear. A good […]

The Sentiment Behind The Declaration of Independence

The Sentiment Behind...

The American political season often conjures numerous references to the country’s origins from either side of the aisle. What better way to join in than by looking at the country’s birth using Data Science, the field that will dictate much of its future. I’ll do this by leveraging a subset of Natural Language Processing (NLP) […]

Naive Bayes and Spam Detection

Naive Bayes and Spam...

In natural language processing, text classification techniques are used to assign a class to a given text.  For example, in spam detection, the classifiers decides an email belongs to a spam or non spam (ham) class. Deciding what the topic of a news article is, or whether a movie review is positive or negative, Authorship […]

Identifying Hate Speech

Identifying Hate Spe...

All the beauty of the internet age comes with its fair share of ugliness. Recently, a deluge of articles highlighting the dark side of Twitter has raised concerns for its future. As great as it is to engage with others on a variety of topics, as of late it’s the bad eggs that seem to […]

The Influence of Tongues

The Influence of Ton...

The Global Language Network is a project of the MIT Media Lab in collaboration with Aix-Marseille Université, Northeastern University, and Harvard University. Drawing from publicly available data sources (Twitter, books, and Wikipedia) the GLN project is an analysis of the global influence of languages. By looking at the structure of the networks connecting multilingual speakers […]

What Your Google Search History Says About You

What Your Google Sea...

By now we are all pretty much used to the fact that Google knows everything about us. What we do, where we go, our interests, etc. Ari Morcos set out to find what could be inferred from his own Googling by downloading two years worth of search data. This is a very nice exploration in Python […]

Evan Schnidman & Bill McMillian – “Hybrid Analytics: Sentiment from Communications”

Evan Schnidman &...

Abstract: Sentiment analysis has too often relied on traditional data science techniques to attempt to glean nuanced sentiment information from complex documents. Traditional methods allow us to get an overview of the forest of data that exists, but deep domain expertise is crucial to getting a more nuanced view of the trees. This talk examines […]

Dissecting the Presidential Debates with an NLP Scalpel

Dissecting the Presi...

The recent Republican and Democratic debates drew unprecedented amounts of viewers and the usual lot of controversies and soundbites in the media. Each debate deeply impacted future polls, subsequent fundraising, and the composition of the race. In our polarized media landscape, ensuing political analysis always suffer from political bias. Whether you trust MSNBC or Fox […]