

What’s New on Kaggle
PlatformsTools & Languagesposted by George McIntire, ODSC April 24, 2018 George McIntire, ODSC

It’s a without a doubt that Kaggle is one of the most important hubs in the data science ecosystem. They’ve been making some news recently with their acquisition by Google and the debut of the new “Learn” platform. The best thing, however, beyond technology, about Kaggle is its community. Kaggle users are known for their avid participation in competitions, but the thing that resonates most to me personally about the community is the constant willingness of developers, students and others to share code and data. All of this is exciting and new on Kaggle and for the Kaggle community.
In this post, I’ll highlight some of the most interesting recent datasets and kernels from the Kaggle community.
Datasets
-
Fast Food Restaurants
- This sprawling collection features geographical data on hundreds of thousands of fast food restaurants in the USA. It includes addresses and latitude & longitude coordinates, making for a very interesting data viz project.
-
Gun Violence Data
- Mass shootings are an ever present topic in the media, so if you’re looking to do some kind of data journalism piece on the issue then this is the dataset for you. It includes a wealth of information such as news articles, detailed data on the shooter and victims, and congressional district
-
Bag of Words and Popcorn
- This one is for the film and NLP buffs of the community. It is a corpus of word vectors trained on movie reviews. Word vectors are always fun to play with, so this should be even more fun.
-
Animal Shelter outcomes
- If I had to give one bit of criticism about Kaggle datasets, it’s that there aren’t enough machine learning datasets in the mix. So, when I come across a dataset that allows for the ability to train a supervised learning model then I jump on it. This is what the animal shelter outcomes data set it for. With this data, you can try to predict whether or not shelters animals end of up getting adopted.
Kernels
-
XGBoost Housing Prices
- XGBoost is probably the hottest machine technique learning outside of neural networks right now. I highly recommend checking out this incredibly detailed Kernel because it explains how to use the algorithm on a housing prices datasets. And don’t let the fact that it’s in R discourage you; Python users can get something from the presentation of the data and results.
-
Comprehensive EDA
- A robust exploratory data analysis process is a major key for any machine learning process, so take good notes on this kernel.
-
Intro to Ensembling
- Ensemble methods are what wins Kaggle competitions so if you want to move up into that top 10 percent, this is where you start.
-
Introduction to CNN with Keras
- Deep Learning algorithms can be a tough nut to crack. I really appreciate this kernel just for that reason because it provides a simple yet comprehensive introduction to Convolutional Neural Nets. This is the algorithm used for image processing.
-
Kernel of Kernels
- Time to get super meta. If you’re a true data nerd, then you’ll really appreciate this kernel analyzing kernels on Kaggle.
-
Spooky NLP and Topic Modeling
- An awesome introduction to topic modeling techniques like LDA and NNMF analyzing the scariest dataset you’ll ever come across. Gets bonus for the solid visuals.
Kaggle is always updating its datasets and its kernels so stay tuned to another version of this article in the future.