Dissecting the Presidential Debates with an NLP Scalpel
The recent Republican and Democratic debates drew unprecedented amounts of viewers and the usual lot of controversies and soundbites in the media. Each debate deeply impacted future polls, subsequent fundraising, and the composition of the race.
In our polarized media landscape, ensuing political analysis always suffer from political bias. Whether you trust MSNBC or Fox News you will get a very different take on what was said.
Furthermore, the fact that some of the debates were only available on cable restricted their reach.
Natural Language Processing (NLP) tools and methods can help bring some objectivity to better understand the current political discourse.
Using different state-of-the-art NLP python libraries and R packages, we will try to answer questions about:
- Debate Dynamics: What can be inferred about the debaters performances?
- Sentiment Analysis: How do the candidates feel about certain issues?
- Summary Extraction: Can we summarize the candidates interventions? (tl;dr)
- and Topic Modeling:
- What did the candidates really talk about?
- Is there a liberal vs. conservative polarity on certain issues?
- What was the most important subject for each candidate?
The transcripts of the 2015 presidential debates can easily be found online. As of this writing, we have a corpus composed of the first Democratic debate and the three Republican ones. All transcripts can be found as raw text files, csv files or python lists in this repository.
Words in Pretty Clouds
Although not very scientific, wordclouds satisfy our brain’s thirst for visual pattern discovery. We use Andreas Mueller’s wordcloud python library. A few lines of code suffice to generate the wordcloud images.
To increase the meaning of the representation, we carried out part-of-speech tagging using the TextBlob python library and only kept the nouns from the corpus. This resulted in much more meaningful wordclouds.
For brevity’s sake, we only show here wordclouds for the 3 main candidates for each party. The python script can easily be modified to produce clouds for other candidates.
There’s truly a lot of talk about the country from both parties, which makes makes sense in the context. However, you would be hard pressed to identify the speaker behind each of these wordclouds. Except maybe for Trump’s “wall.” These pretty images can be interpreted in many ways, and although pleasant to review are not sufficient to extract unbiased meaning.
Different analysis of the transcripts can be carried out by just counting words, word frequencies, length of sentences, frequency of certain pronouns (e.g., I, I’m), etc. We choose to focus on one measure, the candidate’s influence in each debate by measuring how many times each candidate took turn speaking.
In the following figure each candidate’s number of interventions has been normalized with respect to the total number of interventions in the debate. No correction was made for the number of speakers, which varies greatly between parties.
On the left side, the Democratic debate was mostly a conversation between Clinton and Sanders with O’Malley stepping in, while Webb and Chafee had trouble getting heard.
On the right side, the three Republican debates:
- Trump dominated the first two debates, but was less imposing in the 3rd. (I couldn’t help but notice the similarity of Trump’s data with the Trump Tower.)
- Bush follows the same pattern as Trump, more imposing in the second debate. They probably had some intense exchange in that debate.
- Trying to boost their low poll ratings, Cruz, Huckabee and Rubio intervened more in the third debate.
- Paul and Christie lost their drive from the first to the third debate
Knowing who ruled over each debate is interesting, but can we find how the candidates really feel about certain current topics?
Sentiment analysis, a.k.a. opinion mining, has been widely used for nearly a decade to analyze product and movie reviews, brand images, and political and financial trends, among many other things. The core idea is to derive the polarity of a given text at the document, sentence, or feature level.
There are many sentiment analysis libraries. We will use the TextBlob python library, a fast and easy-to-use NLP library.
Given a text, TextBlob can, often with just one line of code, extract entities (sentences, noun_phrases, ..), carry out Part-of-Speech tagging, language translation, tokenization, stemming, and so forth.
TextBlob can also do classification and sentiment analysis out of the box. See also the Pattern library.
Sentiment Analysis calculations in TextBlob are not complicated. As analysed in the article TextBlob Sentiment: Calculating Polarity and Subjectivity, each word in the English lexicon is given a score for:
- 1) polarity: negative vs. positive (-1.0 => +1.0)
- 2) subjectivity: objective vs. subjective (+0.0 => +1.0)
- 3) intensity: modifies next word (x0.5 => x2.0)
Given the simplicity of the TextBlob method and the complexity of human language, let’s first validate that the method gives relevant results.
It would be an understatement to say that Republicans have a negative view of the President while the Democrats have a favorable one. One word that is therefore bound to generate opposite reactions across party lines is the President’s name “Obama.” By aggregating all the sentences mentioning the President’s name for each candidate we should be able to verify that the polarity is negative for the Republicans and positive for the Democrats.
There are 34 sentences mentioning “Obama” during the Democrats debate and 16 in the second Republican debate. The resulting “Obama” polarity is: Democrats: 0.07; Republicans: 0.11. Which would mean that Republicans have a slightly better opinion of the President than the Democrats. Definitely not the result we expected!
Taking a closer look at each candidate’s sentiment score on “Obama,” we see that Sanders has a rather negative opinion of the President (-0.33) while Bush’s opinion is quite positive (0.22). Sanders only cites the President’s name twice during the debate and one of his sentence has a very negative score: -0.66
The Republican party, since I’ve been in the Senate, and since President Obama has been in office, has played a terrible, terrible role of being total obstructionists.
Note that Sander’s sentence would score the same on the word “Republican” as it does on the word “Obama.”
Similarly, the following sentence by Bush scores positively [0.5]:
Six million more people are living in poverty than the day that Barack Obama got elected president.
This type of unsupervised Sentiment Analysis based on a bag of words approach with word frequency and averaging word polarity requires much larger corpuses to be relevant. Thousands of tweets captured over time will result in better polarity calculations than a few dozens complex sentences from a political debate.
However this polarity calculation is still interesting to look at on a speaker/debate basis. In fact, it does complement the previous debate dynamics analysis.
We can clearly see that:
- Sanders is the least positive of the Democrats, an observation which is in line with his message.
- Bush and Trump lose their drive from the first to the third debate. That may well correlate with Bush’s campaign troubles.
- Carson stays the same across the 3 debates (He also was stable across the three debates in terms of interventions.)
- Cruz and Rubio shot through the roof in positivity in the third debate, which correlates with their increased interventions.
Sentiment Analysis does give us some insight on the debaters frame of mind. However, for the people who were not able to watch the debates, is it possible to summarize what was really said in an unbiased way?
Automatic Summary Extraction has also been a hot NLP field of study for years. The idea is to automatically summarize a text by extracting the most meaningful sentences to build a summary. There are many different variants but all of them rely on a three-step process:
- Create an intermediate representation of the text
- Compare and score the sentences
- Build a summary
The Summa python library has summary extractions features out-of-the-box. It is based on TextRank, which is a graph-based ranking model for text processing.
From the seminal paper TextRank: Bringing Order into Texts:
[TextRank is] a graph-based ranking model for text processing. Graph-based ranking algorithms are essentially a way of deciding the importance of a vertex [a sentence in our context] within a graph, based on global information recursively drawn from the entire graph.
Here are the results when limiting the summary to 200 words for each candidate (~ 5-10% of the original text). For brevity sake we only show the results for Sanders and Trump (second debate).
The resulting summaries are quite relevant.
Those are some of the principles that I believe in, and I think we should look to countries like Denmark, like Sweden and Norway, and learn from what they have accomplished for their working people. I think there is a vast majority in this country who want to do the right thing, and I intend to lead the country in bringing our people together.
I believe that the power of corporate America, the power of Wall Street, the power of the drug companies, the power of the corporate media is so great that the only way we really transform America and do the things that the middle class and working class desperately need is through a political revolution when millions of people begin to come together and stand up and say: Our government is going to work for all of us, not just a handful of billionaires.
Now, at the end of our day, here is the truth that very few candidates will say, is that nobody up here, certainly no Republican, can address the major crises facing our country unless millions of people begin to stand up to the billionaire class that has so much power over our economy and our political life.
Trump (second debate)
I say not in a braggadocious way, I’ve made billions and billions of dollars dealing with people all over the world, and I want to put whatever that talent is to work for this country so we have great trade deals, we make our country rich again, we make it great again.
Second of all, we have a lot of really bad dudes in this country from outside, and I think Chris knows that, maybe as well as anybody. We’ve had many people over the years, for many, many years, saying the same thing. Many of the great business people that you know and Carl Icon (ph) is going to work with me on making great deals for this country.
What I’d like to do, and I’ll be putting in the plan in about two weeks, and I think people are going to like it, it’s a major reduction in taxes. I will know more about the problems of this world by the time I sit, and you look at what’s going in this world right now by people that supposedly know, this world is a mess.
The core messages of each candidate seems to be properly convened by these summaries. Is it possible to dive deeper and discover what each one really talked about?
Latent Topic Modeling is an unsupervised technique for topic discovery in large document collections. With a bit of preparation we can apply topic modeling to the debates transcripts.
The two main Topic Modeling approaches are Latent Semantic Analysis -LSA (a deterministic dimension reduction method) and Latent Dirichlet Allocation – LDA (a bayesian probabilistic model). See this article for a comparison.
The practitioner usually faces two main problems in topic analysis:
- Naming the topic once the topic words are known. Often the words associated to a topic do not lend themselves to a clear identification of the topic itself.
- LSA and LDA rely on a apriori known number of topics. Determining the optimal number of topics can be a time consuming endeavour although some solutions exists. See also the hLDA algorithm.
In this article we will use an alternative to LDA, the recent Structural Topic Model R package. The STM algorithm is a mixed-membership topic model (like Latent Dirichlet Allocation) with extensions that facilitate the inclusion of document-level metadata. This means that you can add some external variable to the model and measure their influence on the topics.
The STM package also facilitates selecting the optimal number of topics with a very convenient grid search function and several scoring mechanisms (exclusivity, semantic coherence, likelihood, …).
Naming the actual topic associated with a set of words is also made much easier by the fact that several sets of words, based on different weighting techniques, are made available for each topic: frequent and exclusive (FREX), List (textir), Score (LDA) and Highest probability.
Finally, the STM package has a very practical and effective set of methods to carry out standard NLP pre-processing (tokenization, stemming, stop-words,…)
The following figure illustrates the workflow with the STM package.
- Select the optimal number of topics by visually assessing several metrics: Held Out Likelihood, Semantic Coherence, Exclusivity, etc. In the figure below the optimal number of topics appears to be eight.
- Run a model selection method that discards models with low likelihood values and select the best model according to their semantic coherence and exclusivity. (top right figure below)
- View the most frequent and exclusive words per topic and assess each topic quality with respect to Exclusivity and Semantic Coherence.
The R script is available here.
We end up with the following eight topics:
- Topic 1: Elections
- Topic 2: Climate Change
- Topic 3: Planned Parenthood
- Topic 4: Middle East
- Topic 5: Marijuana
- Topic 6: Business (mostly Trump’s)
- Topic 7: Social security
- Topic 8: Immigration
Note: We have split each candidate intervention in chunks of 1,000 words (approx. three pages long) to boost the number of documents and limit the number of potential topics per document.
By adding the candidate and party covariates into the model we are able to observe the polarity of the party (GOP vs Dems) for certain topics.
By adding the candidate as an external variable, we can also visualize which candidate talked the most about a given topic. The figure below is obtained through the stmBrowser companion package that generates an interactive D3 view of the topics and their relations with the metadata.
The topic at hand is climate change, and we see that this subject is more prevalent with Democrats and particularly Chafee, Sanders, and Clinton. The colors correspond to the party variable (GOP vs Dem), the X-axis to the candidates and the Y-axis and radius to the Climate Change topic.
Using standard out-of-the-shelf Natural Language Processing methods and libraries, we were able to answer many questions about the debates and the debaters.
- The evolution of their drive as well as debate dynamics through data exploration and sentiment analysis
- A relevant summary of the debaters messages
- The main topics effectively addressed in the debates
- and the difference between the parties and the candidates in addressing these topics.
There are many other methods and tools available to further these investigations. We have chosen some of them and only carried out light optimizations. This case study shows that there is a true potential in using these methods to generate meaningful and unbiased analysis of the political discourse.
The STM R package stands out among the existing tools for topic analysis as being easy to use and solving many problems usually encountered in topic extraction.
On Debate Analysis
Other analysis of the 2015 presidential debates include:
- James Pennebaker analysis of the Democratic candidate’s psychology
- Dave Goodsmith study of the candidate’s linguistic signatures based on deriving a list of exclusive words through tf-idf.
- And this in-depth network based analysis on wolfram alpha site.
On Structural Topic Modeling
There are not many online examples of STM usage. Mostly papers.
- The STM package documentation
- Political Visualization Sentiment and the project
- Structural topic models for open-ended survey responses
- The stmBrowser D3 plotting package
- A Survey of Text Summarization Techniques in Mining text Data
- TextRank: Bringing Order into Texts
- Automatic Summarization – Ani Nenkova, Kathleen McKeown
TextBlob and Sentiment Analysis
Transcripts and Code
You can read more from Alex Perrier on his blog.