Last Saturday, in the UEFA Champions League final (think of it as Europe’s Super Bowl), Spanish giants Real Madrid defeated their Italian counterparts Juventus FC 4-1. It was a thrilling match, that saw both sides staking an equal claim to winning the match in the first half, with Madrid eventually prevailing in the 2nd half.
As has become customary here at opendatascience.com, we analyzed the tweets during the match. Using Twitter’s streaming api, we downloaded hundreds of thousands of tweets on over a dozen or so terms related to the match. We graphed the rate of tweets and their sentiment over the course of the match. We also looked at who were the most popular players (spoiler alert: Ronaldo came in first) and we also looked at the most popular emojis. Lastly, for the first time in our Twitter analysis series, we’ve made plotted every geo-tagged tweet onto a map so you can see how soccer fans around the world reacted to the match.
If you’re ever interested in streaming tweets for our own project, here’s the code we used to stream tweets, retrieve the relevant information, and input that into a pandas dataframe.
#Imports and setting up access to Twitter Api import json import pandas as pd import tweepy from tweepy import Stream, OAuthHandler from tweepy.streaming import StreamListener consumer_key = "consumer_key" consumer_secret = "consumer_secret" access_token = "access_token" access_secret = "access_token_secret" auth = OAuthHandler(consumer_key, consumer_secret) auth.set_access_token(access_token, access_secret) api = tweepy.API(auth) #Create streamer class that takes tweets and writes them to a json file class MyListener(StreamListener): def on_data(self, data): try: with open('tweets_file.json', 'a') as f: f.write(data) return True except BaseException as e: print("Error on_data: %s" % str(e)) return True def on_error(self, status): print(status) return True #Launch streamer terms = ["list of terms/hashtags/words that you want to download"] twitter_stream = Stream(auth, MyListener()) twitter_stream.filter(track=terms) #Retrieve tweets from json file and place them into a list json_tweets =  with open('tweets_file.json', 'r') as f: for line in f: tweet = json.loads(line) json_tweets.append(tweet) #Extract the information that you want from each tweet and place it into a nested lest. tweets =  for i in json_tweets: try: line =  line.append(i["text"]) line.append(i["id"]) line.append(i["user"]["screen_name"]) line.append(i["user"]["name"]) line.append(i["favorite_count"]) line.append(i["retweet_count"]) line.append(i["retweeted"]) line.append(i["created_at"]) line.append(i["user"]["followers_count"]) line.append(i["geo"]) line.append(i["lang"]) tweets.append(line) except: pass #Create list of column names that corespond with the information you extracted cols = ["tweet", "ID", "handle", "display_name", "favorite_count", "retweet_count","is_retweeted", "time", "follower_count", "geo", "language"] #Pass nested lists with column names into a pandas dataframe object df = pd.DataFrame(tweets, columns=cols)
This is our standard procedure for our Twitter analysis articles. For my information using Twitter’s API and analyzing its data, check out this tutorial from Marco Bonzanini, author of the excellent book “Mastering Social Media Mining with Python.
One thing I’ve noticed about following sporting events on Twitter is that users react sharply and instantly to every significant event in the match. Fans enjoy games with their fingers on their keyboards ready to post their reactions and jokes.
Below is the rate of tweets (per minute) about the Champions League final over the course of two and a half hours. Significant events such as goals and the start/end of halves have been marked in the plot.
- No surprise, Mario Mandzukic’s brilliant strike in the 27th minute claimed the mantle of the “Most Tweeted About Event” of the match. Mandzukic’s back-to-the-goal scissor goal equalizer sent Twitter into a frenzy, with some people unable to control their emotions.
- Though I’m not sure, the first peak after the match begins is most likely Miralem Pjanic’s laser-like shot on target that was saved by Madrid keeper Keylor Navas.
- As is common for events on Twitter, there’s a spike in tweets right before the event starts.
- Unsurprisingly, goals elicited huge torrents of tweets. The two Madrid goals in the 60th and 63rd minutes caused an uptick in tweets.
- The second highest peak in tweets occurred when Marco Asencio scored his game-clinching goal in the 90th minute, which was followed a huge deflation in tweeting following the conclusion of the match
Real Madrid won the match, but did they win the Twitter battle? Let’s look at the rate and sentiments of tweets mentioning both clubs before, during, and after the match.
- Hands down, Real Madrid dominated the conversation on Twitter. There were only a handful of minutes during the 150-minute window, when more Twitter users were tweeting about Juventus than Madrid. Not a huge shock, Madrid are considered the “biggest club” in the world and have a fanbase in the tens of millions.
- Mandzukic’s wondergoal was barely enough to switch object of Twitter users’ attention to Juventus for several minutes.
- The two Madrid goal in the 60s minute period, launched mentions of Real Madrid miles ahead of Juventus for obvious reason.
- The largest peaks of tweeting about Real Madrid occurred after Ronaldo’s second goal and right after the match ended.
- After the 10-minute mark, Juventus fail to break the 150 tweets/min mark for the rest of the event.
- Juventus generated happier tweets for the first 60 minutes. Naturally Madrid went ahead in the sentiment scores right after their two go-ahead goals.
- There’s sharp downturn in sentiment of Juventus tweets between the start of the second half and Madrid’s second goal. Perhaps Juventus fans were reacting to their team’s poor performance and could sense that they were going to conceding soon.
- Obviously Real Madrid’s highest sentiment came right after the match ended, however tweets mentioning Juventus experienced a spike in sentiment as well. My guess is that Juventus fans were expressing pride in their team’s performance and effort.
Next up, let’s see which players were the most talked about during the game. The following graphic is a bar plot of how many times a player’s name was tweeted during the event. Juventus players are colored black, while Madrid is in purple.
- Cristiano Ronaldo came in first by a country mile. The forward has an astounding performance in the final, netting two goals in the final. With or without a man-of-the-match display, Ronaldo still would’ve likely been number one. ESPN recently anointed the most popular athlete in the world.
- Coming in at second place and the most popular Juventus player overall, is goal keeper Gigi Buffon. The Italian shotstopper has had an illustrious career, but the Champions League trophy has eluded him. Many soccer fans tipped him to finally add the big-eared trophy to trophy case, but it seems as that was the 39-year old’s last chance at becoming a champion of Europe.
- Real Madrid captain Sergio Ramos placed third with just over 2000 mentions. He sparked controversy in the last ten minutes of the match, by wildly exaggerating his reaction to Juventus winger Juan Cuadrado’s confrontation with him, which led to Cuadrado being sent off for yellow card accumulation. Soccer fans reacted with indignant outrage at Ramos’ antics.
- Mandzukic owes his fourth place showing all to goal. Without it, he would not have cracked the top ten. 85% of tweets mentioning him occurred right after the goal.
- The forward position claims the most players in the graph, with 4 members. Goalkepping, defense, and midfield each claim two.
Soccer as you may already know is the global game. What other sport is popular in countries like Panama, Serbia, Kenya, and Vietnam? So we though we it was only appropriate that we construct a map scatter plot of the geotagged tweets from our dataset. In this graphic, hover over a blue dot to see the tweet, the name of the Twitter user, and the time it was posted at (all times EST).
Use the scroll on your mouse to zoom in and out of the map. Try to see if you can find any patterns in the geography of tweets.
- Of the two countries represented in the final, Spain was significantly more active on Twitter than Italy. Madrid claimed the prize as the most popular city.
- Juventus’ hometown Turin displayed surprisingly low levels of activity on Twitter.
- The UK produced the second most tweets in Europe, meanwhile France and Germany were relatively quiet for such a huge soccer event.
- South America had the second most tweets among continents. Colombia, Brazil, and Argentina (who were all represented in the match) led the way. Colombian cities Bogota and Medellin were among the top twenty cities in tweet output.
- The most remote tweet comes courtesy of kwabena, who enjoyed the match in Canada’s Northwest Territories.
- Mexico produced the most tweets in North America, Nigeria did so for Africa, and Indonesia for Asia.
And as a bonus for this Twitter analysis project, here are the most popular emojis from the gameday.
I'm a journalist turned data scientist/journalist hybrid. Looking for opportunities in data science and/or journalism. Impossibly curious and passionate about learning new things. Before completing the Metis Data Science Bootcamp, I worked as a freelance journalist in San Francisco for Vice, Salon, SF Weekly, San Francisco Magazine, and more. I've referred to myself as a 'Swiss-Army knife' journalist and have written about a variety of topics ranging from tech to music to politics. Before getting into journalism, I graduated from Occidental College with a Bachelor of Arts in Economics. I chose to do the Metis Data Science Bootcamp to pursue my goal of using data science in journalism, which inspired me to focus my final project on being able to better understand the problem of police-related violence in America. Here is the repo with my code and presentation for my final project: https://github.com/GeorgeMcIntire/metis_final_project.
- A New Method of Data Mapping – Dimensionality Reduction + Network Theory 124 views | by ODSC Community | under Data Visualization, Modeling
- Modeling Regression Trees 66 views | by Diego Lopez Yse | under Machine Learning, Modeling
- Why Data Scientists Should Definitely Be Writing for Medium 29 views | by Elizabeth Wallace, ODSC | under Career Insights, Featured Post