Saving Machine Learning Models
Hello there! If you are new here, you might want to subscribe to the RSS feedfor updates on this topic. title author date Saving Machine Learning Models Damian Mingle 04/30/2018 Let’s take a look at two conventional ways to save models using scikit-learn a pickle string a pickled model as... Read more
Predicting code bug risk with git metadata
One of the perks of working at Civis is the quarterly ‘Hack Time’. For one week each quarter, you get to explore an offbeat idea of your choice and then present the results to your colleagues. This past quarter I spent my time exploring some off-label uses for the... Read more
LIME Can Make You Better at Machine Learning
LIME is a crucial machine learning tool that can tackle one of the biggest issues in machine learning is the issue of interpretability. You can think of interpretability as explaining how and why a model makes predictions. In this age of the super black box model, it may be... Read more
The History of Big Data Processing in 5 Critical Papers
Read the History of Big Data in These 5 Papers Big data is a multi-faceted area of interest and growth in today’s digital world. While many understand the core concepts of big data, the history is lesser known. Individuals interested in big data can brush up on its history... Read more
What’s New on Kaggle
It’s a without a doubt that Kaggle is one of the most important hubs in the data science ecosystem. They’ve been making some news recently with their acquisition by Google and the debut of the new “Learn” platform. The best thing, however, beyond technology, about Kaggle is its community.... Read more
Waiting time, load factor, and queueing theory – why you need to cut your systems a bit of slack
I’ve been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup) but it’s turned into my little hammer and now I see nails everywhere. One particular relationship that turns out to be... Read more
Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various methods for handling overload.... Read more
When shuffling large arrays, how much time can be attributed to random number generation?
It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int arr, int i,... Read more
Apache Cassandra and ALLOW FILTERING
Prologue Aspiring Cassandra engineer-apprentice was fiddling with Cassandra cluster trying to fetch the data he needed. For a while, he was receiving strange responses from the server. But after hacking his way through the CQL, he finally received the response he was looking for. He felt so proud… For a moment.... Read more
Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While... Read more
Open Data Science - Your News Source for AI, Machine Learning & more