The Bayesians are Coming! The Bayesians are Coming, to Time Series
Editor’s note: Aric is a speaker for ODSC West 2020 this October. Check out his talk, “The Bayesians are Coming! The Bayesians are Coming, to Time Series,” there!  Forecasting has applications across all industries. From needing to predict future values of sales for a product line,... Read more
Data Imputation: Beyond Mean, Median and Mode
This posting is titled Data Imputation: Beyond Mean, Median, and Mode. Types of Missing Data 1.Unit Non-Response Unit Non-Response refers to entire rows of missing data. An example of this might be people who choose not to fill out the census. Here, we don’t necessarily see... Read more
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 2
What’s the best way to model the probability that one player beats another in a digital game a client of your employer designed? This is the second of a two-part series in which you’re a data scientist at a fictional mobile game development company that makes... Read more
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 1
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 1. Imagine you’re a data scientist at an online mobile multiplayer competition platform. Your bosses have a vested interest in paying people with our skillset to predict game outcomes for a variety of... Read more
The Turf War Between Causality and Correlation In Data Science: Which One Is More Important?
Data scientists have tried to differentiate causality from correlation. Last month alone, I’ve seen 20+ posts referencing the catchphrase “correlation is not causality.” What they actually want to say is correlation is not as good as causality. [Related Article: Discovering 135 Nights of Sleep with Data,... Read more
Regression Discontinuity Design: The Crown Jewel of Causal Inference
Background In a series of posts (here, here, here, here and here), I’ve explained why and how we should run social experimentations. However, it’s not possible to do social experiments all the time, and researchers have to identify causal effects by other observational and quasi-experimental methods. [Related Article: Causal Inference: An... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering... Read more
135 Nights of Sleep with Data, Anomaly Detection, and Time Series
In this article, I look at data from 135 nights of sleep and use anomaly detection and time series data to understand the results. Three things are certain in life: death, taxes, and sleeping. Here, we’ll talk about the latest. Every night*, us humans, after a... Read more
3 Regression Pitfalls in Business Applications
Regression is a fantastic tool for aiding business decisions. The traditional purpose of a regression model is to find the mean value of a dependent variable given a set of independent variables. In a business, this purpose should be expanded to include the reduction of uncertainty... Read more
Hierarchical Bayesian Models in R
Hierarchical approaches to statistical modeling are integral to a data scientist’s skill set because hierarchical data is incredibly common. In this article, we’ll go through the advantages of employing hierarchical Bayesian models and go through an exercise building one in R. If you’re unfamiliar with Bayesian... Read more