Building a Microservice for Twitter Real-Time Data Collection and Sentiment Analysis. Building a Microservice for Twitter Real-Time Data Collection and Sentiment Analysis.
First of all, I would like to point out that the skill of building MVP and microservices for a data scientist is extremely useful!... Building a Microservice for Twitter Real-Time Data Collection and Sentiment Analysis.

First of all, I would like to point out that the skill of building MVP and microservices for a data scientist is extremely useful! When you can build a prototype and test it in a working environment it just feels so much better and allows you to better understand your final product.

There is a famous Venn diagram made by Drew Conway where hacker skills refer to programming skills. So according to this diagram programming skills are important for data science in general and for a data scientist particularly.

For disclaimer: I don’t think that Data Scientist should build production-ready systems from the ground up, this is work for professional backend developers.

Right, so programming skills are important, what next?

The first thing a data scientist needs is data. Twitter is a treasure trove of data that is publicly available. It’s a unique data set that anyone can access. There are plenty of studies that showed the predictive power of Twitter sentiment.

Here are some examples:

Bollen, J., Mao, H. and Zeng, X., 2011. Twitter mood predicts the stock market. Journal of computational science2(1), pp.1–8.

Pang, B. and Lee, L., 2008. Opinion mining and sentiment analysis.Foundations and Trends in Information Retrieval2(1–2), pp.1–135.

Wen, M., Yang, D. and Rose, C., 2014, July. Sentiment Analysis in MOOC Discussion Forums: What does it tell us?. In Educational data mining 2014.

Practical project

Practical project for this weekend is building a microservice that can collect Twitter Data and perform some basic sentiment analysis.

Source code can be found in my repository

What it does:

  1. It receives POST request with the instruction of which Twitter it needs to collect
  2. Then it opens the socket connection with Twitter API and starts to receive real-time Twitter data
  3. Each tweet assigned with it sentiment score with help of TextBlob
  4. All data then goes to PostgreSQL database

Main features:

  • Because it uses Nameko and RabbitMQ, you can asynchronously run many different tasks and collect different topics.
  • Because it is based on Flask, you can easily connect this with other services that will communicate with this service by POST requests
  • For each tweet, the service assigns sentiment score using TextBlob and stores it to the PostgreSQL, so latter you can perform analysis on that.
  • It scalable and expandable — you can easily add other sentiment analysis tools or create visualization or web interface because it is based on Flask

Example of POST request json

{
"duration": 60,
"query": ["Bitcoin", "BTC", "Cryptocurrency", "Crypto"],
"translate": false
}

where duration is how many minutes you want to collect streaming twitter data, query is a list of search queries that Twitter API will use to send you relevant tweets, translate is boolean in case False will collect only English tweets, otherwise, the service will try machine translation from TextBlob.

What can be done next

  • You can make for realtime visualization system (like on the picture on the cover)
  • You can make models for predicting assets movements (for example Bitcoin price)
  • You can try different sentiment analysis tools like Vader and Watson
  • You can collect data enough for deep exploratory analysis and find some cool insights, like who is “an opinion leader” in some topic

Are you trying to build your MVP or learn something about sentiment analysis? Let me know in the comments, maybe I could help you!

Original article here.

Alexander Osipenko

Alexander Osipenko

Data Scientist with many years of experience in Computational Chemistry and Time-Series analysis. Passionate about Deep Learning and Data Science. I believe that Fourth Industrial Revolution is happening right now with rise of Big Data and Decentralised systems I'm proud to be the Lead of ML team at Cindicator, the company that stands on the edge of global changes. Links: https://medium.com/@subpath , https://twitter.com/subpath email: osipenko@cindicator.com