An Introduction to Active Learning
The current utility and accessibility of machine learning is in part due to the exponential increase in the availability of data over time. While data is abundant, labels that are required for specific supervised machine learning tasks can be difficult to obtain. At ODSC West in... Read more
Watch: Kubeflow and Beyond: Automation of Model Training, Deployment and Testing
Very often a workflow of training models and delivering them to the production environment contains loads of manual work. Those could be either building a Docker image and deploying it to the Kubernetes cluster or packing the model to the Python package and installing it to... Read more
OS for AI: How Serverless Computing Enables the Next Gen of ML
Jon Peck is a Full Spectrum Developer & Advocate for Algorithmia, an open marketplace for algorithms. At ODSC West 2018, he delivered a talk “OS for AI” which discussed how serverless computing enables the next generation of machine learning. The slides for Peck’s presentation can be... Read more
Make Sense of the Universe with Rapids.AI
Classification of astronomical sources in the night sky is important for understanding the universe. It helps us understand the properties of what makes up celestial systems from our solar system to the most distant galaxy and everything in between. The Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) wanted... Read more
Watch: A Manager’s Guide to Starting a Computer Vision Program
This talk by Ali Vanderveld provides you with a guide for data science leaders and managers who are thinking about starting a computer vision program. It will help you think through a set of sequential questions that you’re likely to encounter along the way: [Related Article:... Read more
Image Augmentation for Convolutional Neural Networks
Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy. This phenomenon is... Read more
Optimizing Hyperparameters for Random Forest Algorithms in scikit-learn
Optimizing hyperparameters for machine learning models is a key step in making accurate predictions. Hyperparameters define characteristics of the model that can impact model accuracy and computational efficiency. They are typically set prior to fitting the model to the data. In contrast, parameters are values estimated... Read more
Transforming Skewed Data for Machine Learning
Skewed data is common in data science; skew is the degree of distortion from a normal distribution. For example, below is a plot of the house prices from Kaggle’s House Price Competition that is right skewed, meaning there are a minority of very large values. Why... Read more
Essential Machine Learning with Linear Models in RAPIDS: Part 1 of a Series
This blog is the first in a series about regression analysis in RAPIDS, an open GPU data science platform. There are many varieties of regression techniques, and we’re working to include them all in RAPIDS. In this blog edition, I use Ordinary Least Squares (OLS) and... Read more
Using RAPIDS with PyTorch
In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on... Read more