Naive Bayes and Spam Detection
In natural language processing, text classification techniques are used to assign a class to a given text.  For example, in spam detection, the classifiers decides an email belongs to a spam or non spam (ham) class. Deciding what the topic of a news article is, or whether a movie... Read more
Versatile Spark – Streaming
Distributed Computing is the fuel for large scale processing in modern data pipelines. Hadoop and its open-source competitors tool this system together. In recent years, rival Apache Spark gained favor due to its versatility. As preference for Apache grows, the software diversifies and its applications increase. Apache Spark is... Read more
Identifying Hate Speech
All the beauty of the internet age comes with its fair share of ugliness. Recently, a deluge of articles highlighting the dark side of Twitter has raised concerns for its future. As great as it is to engage with others on a variety of topics, as of late it’s... Read more
Within soccer’s nascent analytics movement, one metric dominates most discussions. It’s called Expected Goals or xG. Models for calculating xG differ, but the underlying concept is the same. In a nutshell, xG takes a shot’s characteristics – distance from goal, angle from goal, root cause, etc. – and assigns... Read more
Social media has fundamentally changed the way in which we interact with each other, and with the World Wide Web. Our web activities are now inherently social. We can keep in touch with close friends on facebook without ever needing to pick up a phone or get on a... Read more
Every week we bring you a selection of the best data science articles we find in Cyberspace. We start with high school students writings on AI, lessons learned by one of the leading Machine Learning expert, building bots without programming, an intro into probabilistic programming and take a look... Read more
ODSC East US Attendees Visualization
Everyone – well, almost everyone – likes maps, especially maps with interesting data. The final product of analysis hides the mountain of work that goes into its creation. In this case, all the blood, sweat, and tears comes from those versed in the intricacies of geospatial data analysis. So,... Read more
The Coolest Natural Language Processing Applications
Natural Language Processing (NLP) is one of the most interesting areas of Data Science. From analysis of the political arena, to organizing meetings, and forming the bedrock of the dream of strong A.I, training computers to truly understand the nuances of human language is part of the yet unreached... Read more
Every week we bring you a selection of the best data science articles we find in Cyberspace. If you want to dig deeper into these Data Science and Machine Learning topics do not miss the next Open Data Science conference, Boston, May 20-22. With over a 100 talks, workshops... Read more
Though Data Science is still a young field, in many ways it is an amalgamation of many roles that have previously existed. The range of backgrounds represented by Data Scientists clearly illustrates this point. (A snapshot of this can be seen in the Read more

