Identifying Hate Speech

Tags: , , ,

All the beauty of the internet age comes with its fair share of ugliness. Recently, a deluge of articles highlighting the dark side of Twitter has raised concerns for its future. As great as it is to engage with others on a variety of topics, as of late it’s the bad eggs that seem to define Twitter. As seen from Microsoft’s ill-fated AI experiment on Twitter, hate speech is a frequent vehicle of choice in the platform’s darker flight paths.

From a Natural Language Processing perspective, what does or doesn’t qualify as hate speech poses an intriguing problem. Firstly, though, one has to get some data – some labeled data to be more specific. You could collect and label each datapoint yourself, but that will be: 1. quite time consuming and 2. incredibly biased towards your own perspective. How then to be time efficient and less biased?

There are a few options out there, but for ease, it is hard to beat CrowdFlower. The company’s main service allows individuals and organizations to access a large work force for cleaning and labeling data. (Going forward, they are also adding CrowdFlower AI, which will allow users to do Machine Learning on the platform as well.) One cool aspect of the company’s site is the repository of free data sets, one of which concerns hate speech on Twitter. The hate speech identification dataset contains nearly 15 thousand rows with three contributor judgments per tweet. Each contributor analyzed the tweet and said whether it contained hate speech, was offensive but without hate speech, or was not offensive at all.

With that obstacle avoided, it time to build a model to identify hate speech.

The dataset was incomplete, but, luckily, the most relevant columns were filled. Unfortunately a code book wasn’t available, but some of the columns are easy enough to understand. I’ll keep the tweets, whether or not it contains hate speech, and the level of confidence. The labels are text in the data set. For ease of use, it’s best to encode them.

The next step was to process the tweets so they can be used as a feature. For pre-preprocessing I chose to remove urls, whitespace, usernames, hashtags, and punctuation.

Since the lack of a code book made it difficult to determine how the confidence was determined, I only used the tweets as a predictor. My next steps were to vectorize the tweets, split the data into training and testing sets, and train and evaluate a logistic regression model.

The result was an accuracy of about 85%. The model was pretty good at identifying tweets that were either non-offensive or offensive, but not hateful. However, most of its mistakes occurred when it tried to identify tweets with hate speech, most often thinking that they were non-offensive.

We’ll cover there a myriad of ways to improve this analysis in a future post that focus on pre-processing, using another model, or tweaking the classification thresholds. For now, just for fun, let’s see how this average model does, not on the testing set, but on a random made up tweet.

The model said it was not offensive with about 60% accuracy. I suppose it’s not wrong.


Editor’s Note – Hey! Share this post at will, but please backlink. Don’t Hate! ©2016 #ODSC