The internet provides an abundance of content to teach you how to implement machine learning algorithms. Are you overwhelmed and unsure where to begin? These videos of top-rated Open Data Science Conference talks are a great introduction to the topic and offer a great starting point to launch into the field.
General Training Session: Introduction to Machine Learning and Intermediate Machine Learning with scikit-learn
In this beginner’s-level talk, Mueller introduces what machine learning is. He then enumerates some of its applications and introduces practical tools you can use to start building algorithms. Mueller specifically addresses supervised learning, which uses labeled training data, or data with defined input-output pairs. Mueller walks through the entire process, from formalizing a problem to collecting training data to applying and evaluating the algorithm.
The intermediate parallel to this talk aims at those who already have some ML experience, or those who watched Mueller’s introduction. Mueller dives into Python’s Scikit-learn machine learning library and covers some of the package’s more advanced aspects. These include complex pipelines, model evaluation, parameter search and tuning, and out-of-core learning.
Machine Learning in R, a two-part series
Do you prefer to use the R programming language? This two-part course focuses on the methods to implement machine learning algorithms in R and examines some of the underlying theory behind the curtain. Instructions also cover model quality assessment with traditional measures and cross-validation. Before you watch, be sure you have R and RStudio installed, along with the “glmnet,” “xgboost,” “boot,” “ggplot2,” “UsingR,” and “coefplot” packages.
The Past, Present, and Future of Automated Machine Learning
Automated Machine Learning (AutoML) has been described as a “quiet revolution in AI” that is poised to dramatically change the data science landscape. Academic researchers, startups, and tech giants have begun developing AutoML methods and tools ranging from simple open source prototypes to industry-scale software products. Yet beyond all the hype and vague tech jargon, many are left wondering: What is AutoML, really? In this talk, Randy draws from his AutoML research experience to discuss the benefits of AutoML and highlights some promising future directions of the field.
OS for AI: How Serverless Computing Enables the Next Gen of ML
When you have thousands of model versions in a mix of frameworks, how do you efficiently deploy them as elastic, scalable, secure APIs with 10ms of latency? In this lecture, Peck shares insight into solutions for problems many programmers face when building, deploying, and especially scaling algorithms. He discusses the need for and implementations of an“Operating System for AI” — a common interface to combine and use different algorithms. Plus he describes a general architecture for serverless machine learning that is discoverable, versioned, scalable, and sharable.
Target Leakage in Machine Learning
Guts said Target leakage is one of the most difficult problems in developing real-world models. It occurs when training data gets contaminated with information that will not be known at prediction time. Data collection, feature engineering, partitioning, and model validation are all potential sources of data leakage. This talk offers real-life examples of data leakage at different stages of data science projects, discusses countermeasures, and lays out best practices for model validation.