Processing The Language of Pitchfork Part 1

Tags: , , , is the web’s premier site for music criticism and news. Their album reviews are famous for their overt detail, astute prose, and cutting wit. They are often credited for the popularity of indie music in the 00s and 10s and for “breaking” bands such as Animal Collective, Bon Iver, and Grizzly Bear. A good review from Pitchfork can shoot any no-name band into indie-stardom. Whenever they assign an album a perfect 10.0 rating (they use a 0-10.0 rating scale), it makes headlines and sends shockwaves throughout the online music community. They are the golden standard of music writing in the internet age and writing for them is the pinnacle of a career in music criticism.


In a two-part series for ODSC, we’ll be conducting a Natural Language Processing project on 17000 Pitchfork album reviews from 1999 to today. Pitchfork is an excellent subject for a NLP project due to the richness and complexity of their album reviews.


In this first part, we’ll be doing an exploratory data analysis to detect trends in the scores in relation to genre, year, artist, and author. We’ll also be incorporating the TextStat Python library which calculates statistics from text. Using TextStat, we’ll calculate each album’s Flesch Reading Ease Score. This metric “Uses the sentence length (number of words per sentence) and the number of syllables per word in an equation to calculate the reading ease.” Scores are based on a 0-100 scale, with higher scores meaning easier to read and lower scores meaning more difficult to read. This graph shows you how to interpret this score.


In the second part, we’ll ramp up our analysis by using NLP and ML tools to discover trends in bigrams, cluster texts, and make an attempt to predict review scores using their text.


Data Analysis

The first step in our process is to analyze the scores of albums, which is easy for us to do because of Pitchfork’s 0-10.0 rating scale. Without knowing distribution of the scores, it’d be easy to assume that the median and mean of the scores would around 5 and that the majority of scores would fall between upper 3s and lower 6s. Whenever you’re asked in a survey to give assessment of something based on a 0-10 scale, you most likely think that a 5 is neutral and/or mediocre


Mean 6.98
Standard Deviation 1.3
10% Quantile 5.3
25% Quantile 6.4
40% Quantile 7.0
50% Quantile (Median) 7.2
60% Quantile 7.5
75% Quantile 7.8
90% Quantile 8.3


Based on these scores we can see that a “5” is definitely not a neutral score, in fact if you’re a band and you get a “5.0”, that means 92% of albums rated by Pitchfork received the same or better scores than you did. In addition, a 7.2 in most 0-10.0 rating systems would most likely be considered favorable, but according to Pitchfork that score is deemed pretty ordinary.


Flesch Reading Score


As a regular reader of Pitchfork for the last decade, I’m well aware of the complexity of their writing and often times find myself having a dictionary open while I read them. This is why I made the decision to use the TextStat library, I wanted to be able to translate the quality of the reviews into numbers.



Mean 52.8
Standard Deviation 9
10% Quantile 41
25% Quantile 47.12
40% Quantile 51.2
50% Quantile (Median) 53.55
60% Quantile 55.58
75% Quantile 58.62
90% Quantile 64.04


As we can see, the FRE scores follow a normal distribution with a very low standard deviation. This graph and table tell us that the majority of reviews are either of an early college or upper high school reading level and 35% of reviews are at a college reading level or higher. Lastly, if you’re curious, there is absolutely no correlation between FSE score and review score.




Though Pitchfork covers music of all types, it’s coverage is biased towards rock, primarily indie rock music. We were able to get the genre for about 14500 of the approximately 17000 album reviews in our data and of those 14500, 60% are rock music.


Here are the percentages for each genre.


Rock 62%
Electronic 12.4%
Rap 9.1%
Pop/R&B 6.6%
Folk/Country 3.6%
Experimental 3%
Jazz 2.2%
Metal 2.1%
Global 1.1%


Now let’s take a look at the median scores for each genre


Global 7.7
Jazz 7.6
Experimental 7.6
Metal 7.5
Folk/Country 7.4
Electronic 7.3
Rock 7.2
Rap 7.1
Pop/R&B 7.1


Though coverage rock is the biggest recipient of pitchfork’s coverage, that has not translated into better reviews for the genre. Though global and jazz are top two genres in this table, it could be due to their relative lack of reviews, with the two of them combining for 3.3% of reviews.


Now let’s see how the median FSE scores of the genres.


Rap 56.59
Rock 53.55
Pop/R&B 53.55
Metal 53.55
Jazz 52.53
Global 52.53
Electronic 52.53
Folk/Country 51.52
Experimental 51.18


Though it doesn’t look there’s much of a difference among the scores, the low standard deviation of the FSE scores tells us that isn’t the case. An album review with the FSE score of rap’s median FSE score is easier to read than 65% of Pitchfork’s album reviews and an album review with the FSE score of experimental’s median FSE score is harder to read than 60% of album reviews.




In this next part, we’ll do a light time-series analysis to find out which years were great and bad for music.


Here is a table of years from 1999-2016 and the median scores of album reviews in those years.


1999 7.1
2000 7.2
2001 7.5
2002 7.2
2003 7.3
2004 7.5
2005 7.3
2006 7.2
2007 7.1
2008 7.2
2009 7.0
2010 7.2
2011 7.3
2012 7.2
2013 7.2
2014 7.2
2015 7.2
2016 7.3


2004 and 2001 were the best years for music, that means the median score of albums from those years ranks higher than 63% of all reviews. And, on the flipside, in last place, the median score from 2009 is rated lower than 60% of all reviews.


One hypothesis I had regarding the relationship between years and FSE score was that perhaps reviews in the early days of Pitchfork were easier to read but became increasingly difficult as the site grew.


The following plot some what validates my hypothesis


The year with the easiest reading score was Pitchfork’s first year followed by slight increases in the complexity of their reviews. After 2004, we see that reading scores have not significantly changed.

I wasn’t satisfied with the results of this bar chart, so I decided to plot the rolling mean of scores over the years.



This graphic provides a much better understanding of scores in relation to time. We see that there are three hills and two troughs in the data. The highest scores are in the beginning and drop significantly for the next couple years. Then from about 2003 to 2007 scores experience an upswing and achieve a local maximum of about 7.25 and followed by a recession that see scores dip to the very low 6s in 2009 and early 2010. The good times come back in 2012 and 2013 with the average scores hitting above the 7.0. Since then, scores have cooled and reviews are averaging scores in the upper 6s currently.




Next up on our to-do list, is finding out Pitchfork’s most preferred and disliked artists. The following table shows the top ten artists with the highest average scores and the top ten with the lowest average scores for artists who’ve released two or more works.


Artist Score Artist Score
Jet 1.85 Elvis Costello & The Attractions 9.75
Mumford & Sons 2.05 The Velvet Underground 9.533
Wolfie 2.4 John Coltrane 9.5
Statistics 2.5 Glenn Branca 9.35
Asteroid No. 4 2.8 Prince; The Revolution 9.3
Whirlwind Heat 2.8 Billie Holiday 9.3
The Ting Tings 2.8 Bob Wills and His Texas Playboys 9.15
Louis XIV 2.833 Talking Heads 9.15
Metallica 2.85 Ornette Coleman 9.1
Northern State 2.9 Gas 9.1

Much like your parents, Pitchfork believes that older music is the best music. Even though the site was not around for when the acts’ work were released, they do review those albums when they were reissued. Every artist/band in the top ten released the majority of their works before Pitchfork began. On the other side of the spectrum, Pitchfork is a big hater of more contemporary and pop acts like The Ting Tings, Louis XIV, and Mumford & Sons. Jet takes the crown of most hated band. For one of their albums, instead writing a review Pitchfork posted a monkey peeing in its own mouth.


Loyal readers of Pitchfork most often become fans of the site’s writers, not surprisingly because the site publishes some of the most eloquent writers, many of whom have their own distinctive style. In this section, we’ll find out who are the so-called nicest and meanest writers and the easiest and most difficult to read writers.

The following table shows the top ten “nicest” and “meanest” writers for writers who’ve published five or more reviews.


Author Avg Score Author Avg Score
John O’Connor 5.36 Barry Walters 8.92
Matthew Wellins 5.58 Luke Buckman 8.2
Steven Byrd 5.66 Jeff Weiss 8.08
Beatty & Garrett 5.69 Mike McGonigal 7.99
Michael Sandlin 5.74 Brent S. Sirota 7.98
Maud Deitch 5.74 Jenn Pelly 7.97
Evan McGarvey 5.74 kris ex 7.78
Judson Picco 5.77 Erin MacLeod 7.77
Alison Fields 5.81 Anupa Mistry 7.72
Kyle Reiter 5.87 Scott Plagenhoef 7.72


Here’s the table for most hardest and easiest to read writers.


Author Avg Score Author Avg Score
Stuart Berman 36.45 Chip Chanko 82.23
David Moore 36.57 Jason Josephes 73.29
Patrick Bowman 36.97 Dan Kilian 70.61
Matthew Murphy 37.81 Samir Khan 70.24
Abby Garnett 38.05 James P. Wisdom 69.72
Tim Finney 38.25 Cory Byrom 69.72
Amanda Petrusich 38.88 Ernest Wilkins 64.42
Emilie Friedlander 40.40 Ryan Kearney 64.24
Hartley Goldstein 40.92 Beatty & Garrett 64.23
D. Shawn Bosler 41.37 Matt Kallman 64.04


Records Labels

In the final section of this part, we’ll be look at record labels and seeing which ones are held in the highest and lowest regard by Pitchfork.

Here are the top ten highest and lowest rated record labels for labels who’ve released five or more works.

Label Avg Score Label Avg Score
Fiction 4.76 Dust-to-Digital 8.64
Instinct 4.97 Flydaddy 8.62
Ultra 5.06 Stax 8.58
Interscope; Shady; Aftermath 5.26 Analog Africa 8.52
Kinetic 5.34 Hip-O Select 8.45
Interscope 5.38 ReR 8.38
429 5.40 Tumult 8.24
Ed Banger 5.44 Cloud 8.14
Dovecote 5.46 Elektra; Rhino 8.12
Artemis 5.53 Mo’Wax 8.12

This concludes the first part of the series, I know we’ve covered a lot of ground but there’s more to come. Next up we’ll discover trends in the most popular words and phrases. We’ll answer questions like “What adjectives are used the most to describe rap albums?” and “What was the most two and third-word phrases in the year 2007?”.

© ODSC 2016