Intro to Caret: Pre-Processing
Editor’s note: This is the third of a series of posts on the caret package. Creating Dummy Variables Zero- and Near Zero-Variance Predictors Identifying Correlated Predictors Linear Dependencies The preProcess Function Centering and Scaling Imputation Transforming Predictors Putting It All Together Class Distance Calculations caret includes several functions to pre-process... Read more
Implementing a CNN for Text Classification in Tensorflow
The full code is available on Github. In this post we will implement a model similar to Kim Yoon’s Convolutional Neural Networks for Sentence Classification. The model presented in the paper achieves good classification performance across a range of text classification tasks (like Sentiment Analysis) and has since become... Read more
An Introduction to Object Oriented Data Science in Python
A lot of focus in the data science community is on reducing the complexity and time involved in data gathering, cleaning, and organization. This article discusses how object oriented design techniques from software engineering can be used to reduce coding overhead and create robust, reusable data acquisition and cleaning... Read more
Google’s TensorFlow framework spread like wildfire upon its release. The slew of tutorials and extensions made an already robust ecosystem even more so. Recently, Google released one of their own extensions. It’s called SyntaxNet, a TensorFlow based syntactic parser for Natural Language Understanding. SyntaxNet uses neural networks to model... Read more
Within soccer’s nascent analytics movement, one metric dominates most discussions. It’s called Expected Goals or xG. Models for calculating xG differ, but the underlying concept is the same. In a nutshell, xG takes a shot’s characteristics – distance from goal, angle from goal, root cause, etc. – and assigns... Read more
Social media has fundamentally changed the way in which we interact with each other, and with the World Wide Web. Our web activities are now inherently social. We can keep in touch with close friends on facebook without ever needing to pick up a phone or get on a... Read more
The Coolest Natural Language Processing Applications
Natural Language Processing (NLP) is one of the most interesting areas of Data Science. From analysis of the political arena, to organizing meetings, and forming the bedrock of the dream of strong A.I, training computers to truly understand the nuances of human language is part of the yet unreached... Read more
Over time, Python and R have established themselves as the leading languages for Data Science. The rise of both has not been frictionless, though, as the two communities have ‘clashed’ over philosophical differences as each side recruits Data Science newcomers. R users will recommend that R is the better... Read more
Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists at ODSC East... Read more
Amazon Machine Learning: Nice and Easy or Overly Simple?
Can the new Amazon Machine Learning help companies reap the benefits of predictive analytics? Machine Learning as a Service (MLaaS) promises to put data science within the reach of companies. In that context, Amazon Machine Learning is a predictive analytics service with binary/multiclass classification and linear regression features. The service... Read more