Machine Learning for Continuous Integration
Editor’s Note: Andrea Frittoli and Kyra Wulffert are presenting their talk“Machine Learning for Continuous Integration” at ODSC 2019 Europe. Continuous Integration and Data As more applications move to a DevOps model with CI/CD pipelines, the testing required for this development model to work inevitably generates lots of data. This... Read more
The Rise of Notebooks Extended
I recently had the privilege of presenting a workshop at the AI + Education Curiosity Conference 2019. There, I demonstrated to educators, school district staff, researchers, and students how RAPIDS software enables students to learn and iteratively practice data science using full datasets all within classroom time constraints. Compared to current methods and workarounds,... Read more
Weapons Of Math Destruction: The Power Of Oversight
Weapons of Math Destruction: How algorithms have the power to alter our liberties and what we should be doing instead. Big data is making decisions about your future behind the scenes, and it’s likely you don’t even know it. If you’ve ever applied for a job and had to... Read more
How To Get Started With Data Lakes
Data has to be stored somewhere. Data warehouses are repositories for your cleaned, processed data, but what about all that unstructured data your organization is starting to notice. Where does it go? Data Lakes are the newest old thing on the block, so to speak. The concept has been... Read more
25 Excellent Machine Learning Open Datasets
Your machine learning program is only as good as your training sets. Data sets are an integral part of the quality of your machine learning, but you may not always have access to data behind closed walls or the budget to purchase (or rent) the key. Don’t despair. There... Read more
Good, Fast, Cheap: How to do Data Science with Missing Data
When doing any sort of data science problem, we will inevitably run into missing data. Let’s say we’re interviewing 100 people and are recording their answers on a piece of paper in front of us. Specifically, one of our questions asks about income. Consider a few examples of missing... Read more
Confronting the Curse of Dimensionality
Every data scientist eventually confronts the “Curse of Dimensionality,” or trying to work with a large number of feature variables. Machine learning shines when analyzing data with many dimensions. Humans, on the other hand, are not good at finding patterns that may be spread out across a large number... Read more
Ten Reasons for Doing Public Data Hacking
As a data scientist, I’m data hungry. I’m always looking for new sources of data. A few years ago, I kept noticing new open data repositories coming online. For instance, I was excited to learn about the opening in 2015 of the Los Angeles Open Data website in my... Read more
Operation Data Liberation
I’ve had the opportunity recently to talk to people in several different city governments that are facing a common challenge — how to liberate operational data from a legacy system. This is a challenge that lots of city governments face, and it strikes me that there are some common lessons that... Read more