Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While... Read more
In a previous post, I wrote about Throne AI, a sports prediction platform or “Kaggle for sports.” If you’re a sports fan and interested in using your machine learning abilities to predict the outcome of sports matches, then I highly recommend you sign up for Throne AI. After becoming... Read more
The Curious Case of Algo-Trading Dashboard For one of our recent internal projects, we needed a quick and easy way to showcase some first insights, do some plotting and interactive storytelling with the data. We also wanted to build a live, working dashboard in front of a (future) product... Read more
Git First-Parent– Have your messy history and eat it too
Intro The first thing I encountered learning about git: there’s a lot of conflict about whether it’s important to keep a “clean” git history by squashing, rebasing instead of merging, etc. In favor of ‘cleanliness’ 1: git log shows the higher-level history most people will care more about the one-to-one relationship... Read more
The Facebook algorithm is constantly evolving in order to provide a better experience for users. But few changes to the algorithm have sparked as much interest and conversation as the recent ‘meaningful interactions’ update, in which Facebook said it would be prioritizing posts that create meaningful conversations, especially those from... Read more
Amazon Redshift is one of the hottest databases for Data Warehousing right now, it’s one of the most cost-effective solutions available, and allows for integration with many popular BI tools. Unfortunately, the status of the drivers compatibility is a little more shaky, but there is a way to make it... Read more
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more
Should You Put Several Event Types in the Same Kafka Topic?
If you adopt a streaming platform such as Apache Kafka, one of the most important questions to answer is: what topics are you going to use? In particular, if you have a bunch of different events that you want to publish to Kafka as messages, do you put them in the same... Read more
Failure to replicate Schwartz-Ziv and Tishby
Opening the Black Box of Deep Neural Networks via Information didn’t appear at any conferences, to my knowledge, but it still built up some buzz. It has been difficult to replicate, for both bloggers and academics. I attempted to replicate some aspects, and emailed the authors with the message below in an attempt to... Read more
How To Create Data Products That Are Magical Using Sequence-to-Sequence Models
A tutorial on how to summarize text and generate features from Github Issues using deep learning with Keras and TensorFlow. Teaser: Training a model to summarize Github Issues Predictions are in rectangular boxes. The above results are randomly selected elements of a holdout set. Keep reading below, there will be a link to many more... Read more