fbpx

Searched for

51 results found
sampling bias
Dewey Defeats Truman: How Sampling Bias can Ruin Your Model
Let’s look at one of the most famous examples of sampling bias, a phenomenon in which the data sample is selected in such a way that it fails to reflect the true underlying distribution. In the 1948 election season, Thomas Dewey faced off against incumbent Harry Truman for the presidency,... Read more
Bias Variance Decompositions using XGBoost
This blog dives into a theoretical machine learning concept called the bias-variance decomposition. This decomposition is a method which examines the expected generalization error for a given learning algorithm and a given data source. This helps us understand questions like: – How can I achieve higher accuracy with my model... Read more
9 Common Mistakes That Lead To Data Bias
Data scientists spend a lot of time with data, which by itself is neutral. It only follows that answers gleaned from the data would be neutral too. Even though data is neutral, our responses to data are sometimes filled with bias that can skew our outcomes. Let’s examine some common... Read more
Tidy Resampling Redux with Agricultural Economics Data
(No statistical graphs in this one. This is what my dog Artemis looks like when she wants my attention during work hours.) Mindy L. Mallory (@ace_prof) wrote a blog post on Machine Learning and Econometrics: Model Selection and Assessment Statistical Learning Style where she has a great description of the variance-bias tradeoff,... Read more
Do Resampling Estimates Have Low Correlation to the Truth?
The Answer May Shock You. One criticism that is often leveled against using resampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let’s look at this with some simulated data. While this assertion is often... Read more
What Donald Trump and Biased Polls Can Teach Us About Data
Pity the pollster.  As the election cycle mercifully nears its inevitable end, cries of bias from the trailing party will grow louder, and a sport played for well over a hundred years, calling statistics lies, reaches fever pitch. Donald Trump is, of course, correct. Survey polls are biased.   Bias... Read more
Why Do Tree Ensembles Work?
Ensembles of decision trees (e.g., the random forest and AdaBoost algorithms) are powerful and well-known methods of classification and regression. We will survey work aimed at understanding the statistical properties of decision tree ensembles, with the goal of explaining why they work. An elementary probabilistic motivation for ensemble methods comes... Read more
Olivier Blais of Moov AI on His Experience as a Speaker at ODSC West 2018
I am back from Open Data Science Conference (ODSC West) in California. What a blast! Not only was I able to present my talk on the democratization of AI, but I have learned a lot of very interesting stuff! I honestly am impressed by the projects and technologies presented throughout... Read more
15 Common Mistakes Made By Newbie Data Scientists
Junior data scientists are flooding the field as more and more people are transitioning from other areas, some very loosely related to data-driven professions. As a result, there often is a disconnect with the skillsets these “newbies” bring to the table. After all, there is only so much that can... Read more
12 Standout Deep Learning Talks Coming to ODSC East this May
Deep learning continues to be a hot topic as increased demands for AI-driven applications, availability of data, and the need for increased explainability are pushing forward. All of this means that deep learning will not only continue to be a critical area of research in development today but will only... Read more