fbpx
What’s New on Kaggle What’s New on Kaggle
It’s a without a doubt that Kaggle is one of the most important hubs in the data science ecosystem. They’ve been... What’s New on Kaggle

It’s a without a doubt that Kaggle is one of the most important hubs in the data science ecosystem. They’ve been making some news recently with their acquisition by Google and the debut of the new “Learn” platform. The best thing, however, beyond technology, about Kaggle is its community. Kaggle users are known for their avid participation in competitions, but the thing that resonates most to me personally about the community is the constant willingness of developers, students and others to share code and data. All of this is exciting and new on Kaggle and for the Kaggle community. 

In this post, I’ll highlight some of the most interesting recent datasets and kernels from the Kaggle community.

Datasets

  • Fast Food Restaurants

    • This sprawling collection features geographical data on hundreds of thousands of fast food restaurants in the USA. It includes addresses and latitude & longitude coordinates, making for a very interesting data viz project.
  • Gun Violence Data

    • Mass shootings are an ever present topic in the media, so if you’re looking to do some kind of data journalism piece on the issue then this is the dataset for you. It includes a wealth of information such as news articles, detailed data on the shooter and victims, and congressional district
  • Bag of Words and Popcorn

    • This one is for the film and NLP buffs of the community. It is a corpus of word vectors trained on movie reviews. Word vectors are always fun to play with, so this should be even more fun.
  • Animal Shelter outcomes

    • If I had to give one bit of  criticism about Kaggle datasets, it’s that there aren’t enough machine learning datasets in the mix. So, when I come across a dataset that allows for the ability to train a supervised learning model then I jump on it. This is what the animal shelter outcomes data set it for. With this data, you can try to predict whether or not shelters animals end of up getting adopted.

Kernels

  • XGBoost Housing Prices

    • XGBoost is probably the hottest machine technique learning outside of neural networks right now. I highly recommend checking out this incredibly detailed Kernel because it explains how to use the algorithm on a housing prices datasets. And don’t let the fact that it’s in R discourage you; Python users can get something from the presentation of the data and results.
  • Comprehensive EDA

    • A robust exploratory data analysis process is a major key for any machine learning process, so take good notes on this kernel.
  • Intro to Ensembling

    • Ensemble methods are what wins Kaggle competitions so if you want to move up into that top 10 percent, this is where you start.
  • Introduction to CNN with Keras

    • Deep Learning algorithms can be a tough nut to crack. I really appreciate this kernel just for that reason because it provides a simple yet comprehensive introduction to Convolutional Neural Nets. This is the algorithm used for image processing.
  • Kernel of Kernels

    • Time to get super meta. If you’re a true data nerd, then you’ll really appreciate this kernel analyzing kernels on Kaggle.
  • Spooky NLP and Topic Modeling

    • An awesome introduction to topic modeling techniques like LDA and NNMF analyzing the scariest dataset you’ll ever come across. Gets bonus for the solid visuals.

Kaggle is always updating its datasets and its kernels so stay tuned to another version of this article in the future.

George McIntire, ODSC

George McIntire, ODSC

I'm a journalist turned data scientist/journalist hybrid. Looking for opportunities in data science and/or journalism. Impossibly curious and passionate about learning new things. Before completing the Metis Data Science Bootcamp, I worked as a freelance journalist in San Francisco for Vice, Salon, SF Weekly, San Francisco Magazine, and more. I've referred to myself as a 'Swiss-Army knife' journalist and have written about a variety of topics ranging from tech to music to politics. Before getting into journalism, I graduated from Occidental College with a Bachelor of Arts in Economics. I chose to do the Metis Data Science Bootcamp to pursue my goal of using data science in journalism, which inspired me to focus my final project on being able to better understand the problem of police-related violence in America. Here is the repo with my code and presentation for my final project: https://github.com/GeorgeMcIntire/metis_final_project.

1