The Beginner’s Guide to Scikit-Learn
Scikit-Learn is one of the premier tools in the machine learning community, used by academics and industry professionals alike. At ODSC East 2019, Scikit-Learn author Andreas Mueller will host a training session to give beginners a crash course.  As one of the primary contributors to Scikit-Learn, Mueller is one... Read more
Sentiment Analysis in R Made Simple
Sentiment analysis is located at the heart of natural language processing, text mining/analytics, and computational linguistics. It refers to any measurement technique by which subjective information is extracted from textual documents. In other words, it extracts the polarity of the expressed sentiment in a range spanning from positive to... Read more
Organizing Your Next Data Science Project to Minimize Headaches
Call it the data scientist’s curse, but every practitioner has had a project that became unmanageable at some point because of poor organizational choices early on. We’ve all been at our desks at 2 a.m. changing values and re-running our scripts for the 80th time in an hour, asking... Read more
Google Dataset Search Launched to Help Analysts Scour Repositories
Google Dataset Search is a new product in the beta phase that you can use to find datasets published online. The single interface allows you to search repositories worldwide. Imagine you start a new analytics project. For example, let’s say you want to explore numbers pertaining to Boston Public Schools. Before... Read more
An Introduction to Sentence-Level Sentiment Analysis with sentimentr
Sentiment analysis algorithms understand language word by word, estranged from context and word order. But our languages are subtle, nuanced, infinitely complex, and entangled with sentiment. They defy summaries cooked up by tallying the sentiment of constituent words. Unsophisticated sentiment analysis techniques calculate sentiment/polarity by matching words back to a... Read more
All the Best Parts of Pandas for Data Science
Pandas has been hailed by many in the data science community as the missing link between Python and analysis, a tool that can be leveraged in order to dramatically reduce overhead in data science projects, increase understandability and speed up workflows.   Pandas comes loaded with a wide range... Read more
K-Means Clustering Applied to GIS Data
GIS can be intimidating to data scientists who haven’t tried it before, especially when it comes to analytics. On its face, mapmaking seems like a huge undertaking. Plus esoteric lingo and strange datafile encodings can create a significant barrier to entry for newbies. There’s a reason why there are experts who... Read more
TensorLayer for Developing Complex Deep Learning Systems
This article describes TensorLayer, a modular Python wrapper library for TensorFlow allowing data scientists to streamline the development of complex deep learning systems. TensorLayer was released in September 2016 with a GitHub repo. A descriptive research paper followed in August 2017: TensorLayer: A Versatile Library for Efficient Deep Learning... Read more
Monthly Summary of Selected Trends, Activities and Insights for R – August 2018
Data for the trends and activities summarized here were obtained from popular websites used by the R community such as Google, GitHub, StackOverflow, Rstudio, METACRAN and R-Bloggers StackOverflow Number of StackOverflow Questions tagged R: 4,565 (8%  down from July) Number of Answers for R questions: 4,630 (3%  up from... Read more
Understanding the Hoeffding Inequality
If you read my last post on mathematically defining machine learning problems, then you’ll be familiar with the terminology here. Otherwise, I recommend you read that and then circle back here. The Hoeffding Bound is one of the most important results in machine learning theory, so you’d do well... Read more