10 Best Data Science Platforms
A data science platform can change the way you work. It’s more than just a tool, it’s a way to wrangle data and turn every member of your team into a high performing unit, capable of pivoting and scaling without missing a beat. The right one... Read more
Saving Machine Learning Models
Hello there! If you are new here, you might want to subscribe to the RSS feedfor updates on this topic. title author date Saving Machine Learning Models Damian Mingle 04/30/2018 Let’s take a look at two conventional ways to save models using scikit-learn a pickle string a... Read more
Predicting code bug risk with git metadata
One of the perks of working at Civis is the quarterly ‘Hack Time’. For one week each quarter, you get to explore an offbeat idea of your choice and then present the results to your colleagues. This past quarter I spent my time exploring some off-label... Read more
LIME Can Make You Better at Machine Learning
LIME is a crucial machine learning tool that can tackle one of the biggest issues in machine learning is the issue of interpretability. You can think of interpretability as explaining how and why a model makes predictions. In this age of the super black box model,... Read more
The History of Big Data Processing in 5 Critical Papers
Read the History of Big Data Processing in These 5 Papers Big data is a multi-faceted area of interest and growth in today’s digital world. While many understand the core concepts of big data, the history is lesser known. Individuals interested in big data processing can... Read more
What’s New on Kaggle
It’s a without a doubt that Kaggle is one of the most important hubs in the data science ecosystem. They’ve been making some news recently with their acquisition by Google and the debut of the new “Learn” platform. The best thing, however, beyond technology, about Kaggle... Read more
Waiting time, load factor, and queueing theory – why you need to cut your systems a bit of slack
I’ve been reading up on operations research lately, including queueing theory. It started out as a way to understand the very complex mortgage process (I work at a mortgage startup) but it’s turned into my little hammer and now I see nails everywhere. One particular relationship that turns... Read more
Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various... Read more
When shuffling large arrays, how much time can be attributed to random number generation?
It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int... Read more
Apache Cassandra and ALLOW FILTERING
Prologue Aspiring Cassandra engineer-apprentice was fiddling with Cassandra cluster trying to fetch the data he needed. For a while, he was receiving strange responses from the server. But after hacking his way through the CQL, he finally received the response he was looking for. He felt so proud…... Read more