fbpx
Machine Learning for Continuous Integration
Editor’s Note: Andrea Frittoli and Kyra Wulffert are presenting their talk“Machine Learning for Continuous Integration” at ODSC 2019 Europe. Continuous Integration and Data As more applications move to a DevOps model with CI/CD pipelines, the testing required for this development model to work inevitably generates lots... Read more
The Rise of Notebooks Extended
I recently had the privilege of presenting a workshop at the AI + Education Curiosity Conference 2019. There, I demonstrated to educators, school district staff, researchers, and students how RAPIDS software enables students to learn and iteratively practice data science using full datasets all within classroom time constraints. Compared to current... Read more
Weapons Of Math Destruction: The Power Of Oversight
Weapons of Math Destruction: How algorithms have the power to alter our liberties and what we should be doing instead. Big data is making decisions about your future behind the scenes, and it’s likely you don’t even know it. If you’ve ever applied for a job... Read more
How To Get Started With Data Lakes
Data has to be stored somewhere. Data warehouses are repositories for your cleaned, processed data, but what about all that unstructured data your organization is starting to notice. Where does it go? Data Lakes are the newest old thing on the block, so to speak. The... Read more
25 Excellent Machine Learning Open Datasets
Editor’s note: There is an updated version of this article for 2021. Please read it here for the most up-to-date listing on machine learning datasets! Your machine learning program is only as good as your training sets. Data sets are an integral part of the quality... Read more
Good, Fast, Cheap: How to do Data Science with Missing Data
When doing any sort of data science problem, we will inevitably run into missing data. Let’s say we’re interviewing 100 people and are recording their answers on a piece of paper in front of us. Specifically, one of our questions asks about income. Consider a few... Read more
Confronting the Curse of Dimensionality
Every data scientist eventually confronts the “Curse of Dimensionality,” or trying to work with a large number of feature variables. Machine learning shines when analyzing data with many dimensions. Humans, on the other hand, are not good at finding patterns that may be spread out across... Read more
Ten Reasons for Doing Public Data Hacking
As a data scientist, I’m data hungry. I’m always looking for new sources of data. A few years ago, I kept noticing new open data repositories coming online. For instance, I was excited to learn about the opening in 2015 of the Los Angeles Open Data... Read more
Operation Data Liberation
I’ve had the opportunity recently to talk to people in several different city governments that are facing a common challenge — how to liberate operational data from a legacy system. This is a challenge that lots of city governments face, and it strikes me that there are some... Read more