Let’s look at one of the most famous examples of
sampling bias, a phenomenon in which the data sample is selected in such a way that it fails to reflect the true underlying distribution. In the 1948 election season, Thomas Dewey faced off against incumbent Harry Truman for the presidency,...
Read more This blog dives into a theoretical machine learning concept called the
bias-variance decomposition. This decomposition is a method which examines the expected generalization error for a given learning algorithm and a given data source. This helps us understand questions like: – How can I achieve higher accuracy with my model...
Read more Data scientists spend a lot of time with data, which by itself is neutral. It only follows that answers gleaned from the data would be neutral too. Even though data is neutral, our responses to data are sometimes filled with
bias that can skew our outcomes. Let’s examine some common...
Read more (No statistical graphs in this one. This is what my dog Artemis looks like when she wants my attention during work hours.) Mindy L. Mallory (@ace_prof) wrote a blog post on Machine Learning and Econometrics: Model Selection and Assessment Statistical Learning Style where she has a great description of the variance-
bias tradeoff,...
Read more The Answer May Shock You. One criticism that is often leveled against using re
sampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let’s look at this with some simulated data. While this assertion is often...
Read more Pity the pollster. As the election cycle mercifully nears its inevitable end, cries of
bias from the trailing party will grow louder, and a sport played for well over a hundred years, calling statistics lies, reaches fever pitch. Donald Trump is, of course, correct. Survey polls are
biased.
Bias...
Read more Ensembles of decision trees (e.g., the random forest and AdaBoost algorithms) are powerful and well-known methods of classification and regression. We will survey work aimed at understanding the statistical properties of decision tree ensembles, with the goal of explaining why they work. An elementary probabilistic motivation for ensemble methods comes...
Read more I am back from Open Data Science Conference (ODSC West) in California. What a blast! Not only was I able to present my talk on the democratization of AI, but I have learned a lot of very interesting stuff! I honestly am impressed by the projects and technologies presented throughout...
Read more Junior data scientists are flooding the field as more and more people are transitioning from other areas, some very loosely related to data-driven professions. As a result, there often is a disconnect with the skillsets these “newbies” bring to the table. After all, there is only so much that can...
Read more Deep learning continues to be a hot topic as increased demands for AI-driven applications, availability of data, and the need for increased explainability are pushing forward. All of this means that deep learning will not only continue to be a critical area of research in development today but will only...
Read more