Make Sense of the Universe with Rapids.AI
Classification of astronomical sources in the night sky is important for understanding the universe. It helps us understand the properties of what makes up celestial systems from our solar system to the most distant galaxy and everything in between. The Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) wanted... Read more
Watch: A Manager’s Guide to Starting a Computer Vision Program
This talk by Ali Vanderveld provides you with a guide for data science leaders and managers who are thinking about starting a computer vision program. It will help you think through a set of sequential questions that you’re likely to encounter along the way: [Related Article:... Read more
Image Augmentation for Convolutional Neural Networks
Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy. This phenomenon is... Read more
Optimizing Hyperparameters for Random Forest Algorithms in scikit-learn
Optimizing hyperparameters for machine learning models is a key step in making accurate predictions. Hyperparameters define characteristics of the model that can impact model accuracy and computational efficiency. They are typically set prior to fitting the model to the data. In contrast, parameters are values estimated... Read more
Transforming Skewed Data for Machine Learning
Skewed data is common in data science; skew is the degree of distortion from a normal distribution. For example, below is a plot of the house prices from Kaggle’s House Price Competition that is right skewed, meaning there are a minority of very large values. Why... Read more
Essential Machine Learning with Linear Models in RAPIDS: Part 1 of a Series
This blog is the first in a series about regression analysis in RAPIDS, an open GPU data science platform. There are many varieties of regression techniques, and we’re working to include them all in RAPIDS. In this blog edition, I use Ordinary Least Squares (OLS) and... Read more
Using RAPIDS with PyTorch
In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on... Read more
The Empirical Derivation of the Bayesian Formula
Deep learning has been made practical through modern computing power, but it is not the only technique benefiting from this large increase in power. Bayesian inference is up and coming technique whose recent progress is powered by the increase in computing power. We can explain the... Read more
Using Auto-sklearn for More Efficient Model Training
Applying a machine learning algorithm to any number of data-related tasks can be an enormous time saver, but the variable factors associated with creating an algorithm can be daunting. One must consider a variety of design-related decisions, and the risks surrounding the creation of an accurate... Read more
Strategies for Addressing Class Imbalance
Class imbalance is common in real-world datasets. For example, a dataset with examples of credit card fraud will often have exponentially more records of non-fraudulent activity than those of fraudulent cases. In many applications, training your model on imbalanced classes can inhibit model functionality if predictive... Read more