fbpx
Data Science’s Role in Anomaly Detection
Anomalies. Oxford dictionary defines them as things that deviate from what is normal or expected. No matter what field you are in, they seem to pop up and occur without warning. In the realm of data, anomalies can lead to incorrect or out-of-date decisions to be... Read more
Introducing PyMC Labs: Saving the World with Bayesian Modeling
After I left Quantopian in 2020, something interesting happened: various companies contacted me inquiring about consulting to help them with their PyMC3 models. Usually, I don’t hear how people are using PyMC3 — they mostly show up on GitHub or Discourse when something isn’t working right. So, hearing about all these really... Read more
The Bayesians are Coming! The Bayesians are Coming, to Time Series
Editor’s note: Aric is a speaker for ODSC West 2020 this October. Check out his talk, “The Bayesians are Coming! The Bayesians are Coming, to Time Series,” there!  Forecasting has applications across all industries. From needing to predict future values of sales for a product line,... Read more
Data Imputation: Beyond Mean, Median and Mode
This posting is titled Data Imputation: Beyond Mean, Median, and Mode. Types of Missing Data 1.Unit Non-Response Unit Non-Response refers to entire rows of missing data. An example of this might be people who choose not to fill out the census. Here, we don’t necessarily see... Read more
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 2
What’s the best way to model the probability that one player beats another in a digital game a client of your employer designed? This is the second of a two-part series in which you’re a data scientist at a fictional mobile game development company that makes... Read more
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 1
From Idea to Insight: Using Bayesian Hierarchical Models to Predict Game Outcomes Part 1. Imagine you’re a data scientist at an online mobile multiplayer competition platform. Your bosses have a vested interest in paying people with our skillset to predict game outcomes for a variety of... Read more
The Turf War Between Causality and Correlation In Data Science: Which One Is More Important?
Data scientists have tried to differentiate causality from correlation. Last month alone, I’ve seen 20+ posts referencing the catchphrase “correlation is not causality.” What they actually want to say is correlation is not as good as causality. [Related Article: Discovering 135 Nights of Sleep with Data,... Read more
Regression Discontinuity Design: The Crown Jewel of Causal Inference
Background In a series of posts (here, here, here, here and here), I’ve explained why and how we should run social experimentations. However, it’s not possible to do social experiments all the time, and researchers have to identify causal effects by other observational and quasi-experimental methods. [Related Article: Causal Inference: An... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering... Read more
135 Nights of Sleep with Data, Anomaly Detection, and Time Series
In this article, I look at data from 135 nights of sleep and use anomaly detection and time series data to understand the results. Three things are certain in life: death, taxes, and sleeping. Here, we’ll talk about the latest. Every night*, us humans, after a... Read more