## The 5 Skills You Need to Start Machine Learning

Career InsightsFeatured PostMathStatisticsposted by ODSC Team August 20, 2021

With any new skill, hobby, or career path, you likely have more questions than answers. How do I get started? What skills do I need to focus on first? What sources do I trust to learn all of this? Data science and machine learning are no... Read more

## A Quick Look Into Bootstrapping

Machine LearningModelingRStatisticsTools & LanguagesbootstrappingStatisticsposted by Leihua Ye December 3, 2019

Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering... Read more

## Hierarchical Bayesian Models in R

Hierarchical approaches to statistical modeling are integral to a data scientist’s skill set because hierarchical data is incredibly common. In this article, we’ll go through the advantages of employing hierarchical Bayesian models and go through an exercise building one in R. If you’re unfamiliar with Bayesian... Read more

## Why Do Tree Ensembles Work?

Guest contributorMachine LearningModelingStatisticsMachine LearningStatisticsposted by Joe Ross April 10, 2019

Ensembles of decision trees (e.g., the random forest and AdaBoost algorithms) are powerful and well-known methods of classification and regression. We will survey work aimed at understanding the statistical properties of decision tree ensembles, with the goal of explaining why they work. An elementary probabilistic motivation... Read more

## Confidence Intervals for Data Scientists

ModelingStatisticsStatisticsposted by Daniel Gutierrez, ODSC January 17, 2019

Confidence interval is a basic statistical concept commonly employed by data scientists. Without a formal background in statistics, however, some data scientists tend to scratch their heads with respect to their understanding of what’s really going on with this notion. In this article, we’ll review the... Read more

## How to Play Fantasy Sports Strategically (and Win)

Daily Fantasy Sports is a multibillion-dollar industry with millions of annual users. The Imperial College Business School’s Martin Haugh created a framework to best those users by modeling what they’ll do and constructing a team based on it. Haugh presented his research on how to play... Read more

## Thomas Wiecki of Quantopian on ‘Minding the Gap’ Between Statistics and Machine Learning at ODSC Europe 2018

ConferencesModelingStatisticsConferencesMachine LearningODSC EuropeStatisticsposted by Daniel Gutierrez, ODSC November 20, 2018

Key Takeaways: It’s important for data scientists to understand the so-called “gap” between statistics and machine learning, and how there actually is a lot of commonality between the two; it’s just a matter of how you look at things. PyMC3 is a very useful probabilistic programming... Read more

## Exploring the Central Limit Theorem in R

StatisticsCentral Limit TheoremStatisticsposted by Daniel Gutierrez, ODSC November 9, 2018

The Central Limit Theorem (CLT) is arguably the most important theorem in statistics. It’s certainly a concept that every data scientist should fully understand. In this article, we’ll go over some basic theory of the CLT, explain why it’s important for data scientists, and present some... Read more

## Mine Like Amazon with Market Basket Analysis

ModelingStatisticsmarket basket analysisStatisticsposted by Spencer Norris, ODSC October 12, 2018

Pattern mining is an incredibly simple but powerful technique for discovering cooccurrences in large datasets. The most common approach to find those patterns is Market Basket Analysis, which is frequently pointed out as the method Amazon leverages for their “users also purchased” feature. Of course, that’s... Read more

## Another batch of Think Stats notebooks

BlogPythonStatisticsposted by Allen Downey June 15, 2017

Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks.  When I am done, each chapter will have a notebook that shows the examples from the book along with some small exercises, with more substantial... Read more