fbpx
Crash Course: Pool-Based Sampling in Active Learning
Active learning is a class of machine learning problems where labeled data isn’t available for supervised algorithms. Let’s take the classic setup as an example. Say we have pictures of birds and want to classify them by type, but the images don’t have labels for what kind of bird... Read more
Classic Regularization Techniques in Neural Networks
Neural networks are notoriously tricky to optimize. There isn’t a way to compute a global optimum for weight parameters, so we’re left fishing around in the dark for acceptable solutions while trying to ensure we don’t overfit the data. This is a quick overview of the most popular model regularization... Read more
How Entertainment and Social Media Giants are Using Machine Learning
Major names in social media didn’t get there by accident. In addition to their excellent products, marketing, and sales strategies, machine learning is a huge part of the backbone that makes many of their processes successful. Facebook, Twitter, among other names you’ve definitely heard of have become the powerhouses... Read more
Exploring Scikit-Learn Further: The Bells and Whistles of Preprocessing
In my previous post, we constructed a simple cross-validated regression model using Scikit-Learn in 35 lines. It’s pretty amazing that we can perform machine learning with so little effort, but we just did the bare minimum in order to get a working model. Frankly, it didn’t even perform that well.... Read more
Three Ways Researchers are Using Data Science for Good
Data experts have long identified marginalization and narrow-minded problem solving as some of the biggest challenges facing data science. When large technology enterprises only seek solutions to problems they face within their company and their communities, it exacerbates inequalities. But companies, nonprofits, and individuals across the globe are making... Read more
What Model Should I Choose for My Data Science Project?
What to ask yourself when you’re balancing model performance, interpretability, and other costs It might seem silly to bother doing anything other than build the best black box machine learning model possible, as long as it gets good performance. That makes perfect sense on personal projects and Kaggle competitions. But it’s an... Read more
K-Means Clustering Applied to GIS Data
Here, we use k-means clustering with GIS Data. GIS can be intimidating to data scientists who haven’t tried it before, especially when it comes to analytics. On its face, mapmaking seems like a huge undertaking. Plus esoteric lingo and strange datafile encodings can create a significant barrier to entry... Read more
Assessment Metrics for Clustering Algorithms
Assessing the quality of your model is one of the most important considerations when deploying any machine learning algorithm. For supervised learning problems, this is easy. There are already labels for every example, so the practitioner can test the model’s performance on a reserved evaluation set. We don’t have... Read more
Three Challenges for Open Data Science
There are three types of lies: lies, damned lies, and ‘big data.’ That’s the message Amazon machine learning director Neil Lawrence began his ODSC Europe 2016 lecture with before laying out the three largest challenges for open data science and our data-centered society. As Lawrence sees it, those challenges... Read more
Comparing Features of 4 Popular Machine Learning Platforms
Machine learning, the term and the technology, has been of paramount importance and relevance in the context of computational applications for years. Arthur Samuel first came up with the word “machine learning” in 1957. Machine learning is basically a part of artificial intelligence that evolves through the fields of... Read more