ODSC Meetup: Automated and Interpretable Machine Learning
Last week, ODSC hosted a talk by Dr. Francesca Lazzeri, Senior Machine Learning Scientist at Microsoft, on the capabilities of automated and interpretable machine learning software in Microsoft’s Azure. Notably, this talk is part of a series that covers a variety of data science topics. The talks are great... Read more
When Less is More: A Brief Story About Feature Engineering with XGBoost
I played a minor role launching RAPIDS on Google Dataproc by refining a model that predicts taxi fare in New York City. Geographic location of passenger pick-ups and drops-offs were columns in the data. These are recorded as longitude and latitude measurements, with precision to many decimal places. One of the... Read more
Interpretable Machine Learning – Fairness, Accountability, and Transparency in ML systems
Editor’s note: Sayak is a speaker for ODSC West in San Francisco this November! Be sure to check out his talk, “Interpretable Machine Learning – Fairness, Accountability and Transparency in ML systems,” there! The problem is it is much harder to evaluate machine learning systems than to train them.... Read more
Make Sense of the Universe with Rapids.AI
Classification of astronomical sources in the night sky is important for understanding the universe. It helps us understand the properties of what makes up celestial systems from our solar system to the most distant galaxy and everything in between. The Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC) wanted to revolutionize the... Read more
Watch: A Manager’s Guide to Starting a Computer Vision Program
This talk by Ali Vanderveld provides you with a guide for data science leaders and managers who are thinking about starting a computer vision program. It will help you think through a set of sequential questions that you’re likely to encounter along the way: [Related Article: 4 Steps to... Read more
Image Augmentation for Convolutional Neural Networks
Limited data is a major obstacle in applying deep learning models like convolutional neural networks. Often, imbalanced classes can be an additional hindrance; while there may be sufficient data for some classes, equally important, but undersampled classes will suffer from poor class-specific accuracy. This phenomenon is intuitive. If the... Read more
Using RAPIDS with PyTorch
In this post we take a look at how to use cuDF, the RAPIDS dataframe library, to do some of the preprocessing steps required to get the mortgage data in a format that PyTorch can process so that we can explore the performance of deep learning on tabular data and... Read more
The Empirical Derivation of the Bayesian Formula
Editor’s note: James is a speaker for ODSC London this November! Be sure to check out his talk, “The How, Why, and When of Replacing Engineering Work with Compute Power” there. Deep learning has been made practical through modern computing power, but it is not the only technique benefiting... Read more
10 Compelling Machine Learning Dissertations from Ph.D. Students
As a data scientist, an integral part of my work in the field revolves around keeping current with research coming out of academia. I frequently scour arXiv.org for late-breaking papers that show trends and fertile areas of research. Other sources of valuable research developments are in the form of... Read more
Using Auto-sklearn for More Efficient Model Training
Applying a machine learning algorithm to any number of data-related tasks can be an enormous time saver, but the variable factors associated with creating an algorithm can be daunting. One must consider a variety of design-related decisions, and the risks surrounding the creation of an accurate architecture can make... Read more