Sheddable Requests: The Intersection of Hackweeks, Book Clubs, and Site Reliability Engineering
One of the things I love about working at Civis is the opportunity we have for continuous learning. Not long ago I had the opportunity to be involved in a book club which read through Google’s Site Reliability Engineering book. One of the essays in this book addressed various methods for handling overload.... Read more
When shuffling large arrays, how much time can be attributed to random number generation?
It is well known that contemporary computers don’t like to randomly access data in an unpredictible manner in memory. However, not all forms of random accesses are equally harmful. To randomly shuffle an array, the textbook algorithm, often attributed to Knuth, is simple enough: void swap(int arr, int i,... Read more
Apache Cassandra and ALLOW FILTERING
Prologue Aspiring Cassandra engineer-apprentice was fiddling with Cassandra cluster trying to fetch the data he needed. For a while, he was receiving strange responses from the server. But after hacking his way through the CQL, he finally received the response he was looking for. He felt so proud… For a moment.... Read more
Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While... Read more
In a previous post, I wrote about Throne AI, a sports prediction platform or “Kaggle for sports.” If you’re a sports fan and interested in using your machine learning abilities to predict the outcome of sports matches, then I highly recommend you sign up for Throne AI. After becoming... Read more
The Curious Case of Algo-Trading Dashboard For one of our recent internal projects, we needed a quick and easy way to showcase some first insights, do some plotting and interactive storytelling with the data. We also wanted to build a live, working dashboard in front of a (future) product... Read more
Git First-Parent– Have your messy history and eat it too
Intro The first thing I encountered learning about git: there’s a lot of conflict about whether it’s important to keep a “clean” git history by squashing, rebasing instead of merging, etc. In favor of ‘cleanliness’ 1: git log shows the higher-level history most people will care more about the one-to-one relationship... Read more
The Facebook algorithm is constantly evolving in order to provide a better experience for users. But few changes to the algorithm have sparked as much interest and conversation as the recent ‘meaningful interactions’ update, in which Facebook said it would be prioritizing posts that create meaningful conversations, especially those from... Read more
Amazon Redshift is one of the hottest databases for Data Warehousing right now, it’s one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status of the drivers compatibility is a little more shaky, but there is a way to make it... Read more
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more