Predicting code bug risk with git metadata
One of the perks of working at Civis is the quarterly ‘Hack Time’. For one week each quarter, you get to explore an offbeat idea of your choice and then present the results to your colleagues. This past quarter I spent my time exploring some off-label uses for the... Read more
Dask Release 0.17.2
This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation. I’m pleased to announce the release of Dask version 0.17.2. This is a minor release with new features and stability improvements. This blogpost outlines notable changes since the 0.17.0 release on February 12th. You can... Read more
EXPLORATORY ANALYSIS – WHEN TO CHOOSE R, PYTHON, TABLEAU OR A COMBINATION
Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While... Read more
Craft Minimal Bug Reports
Following up on a post on supporting users in open source this post lists some suggestions on how to ask a maintainer to help you with a problem. You don’t have to follow these suggestions. They are optional. They make it more likely that a project maintainer will spend time helping... Read more
This work is supported by Continuum Analytics, and the Data Driven Discovery Initiative from the Moore Foundation. This blogpost is about experimental software. The project may change or be abandoned without warning. You should not depend on anything within this blogpost. This week I built a small streaming library for Python. This was... Read more
WHAT PROGRAMMING LANGUAGES ARE USED MOST ON WEEKENDS?
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more
This blogpost is about topic modeling using data from this blog, opendatascience.com. From this, combined with the most visited articles of the year, we will generate the most popular topics of 2017. Last year, we did something similar with popular articles streamed through twitter using Non-Negative Matrix Factorization to determine topics, article... Read more
On Taking Things Too Seriously: Holiday Edition
For some reason Atlanta got a pretty significant amount of snow yesterday, and because of that I’ve been mostly stuck at home. When faced with that kind of time on hand, sometimes I spend too much time on things that don’t really matter all that much. Recently, I’ve been... Read more
On Machine Learning and Programming Languages
This article was co-written by Mike Innes (Julia Computing), David Barber (UCL), Tim Besard (UGent), James Bradbury (Salesforce Research), Valentin Churavy (MIT), Simon Danisch (MIT), Alan Edelman (MIT), Stefan Karpinski (Julia Computing), Jon Malmaud (MIT), Jarrett Revels (MIT), Viral Shah (Julia Computing), Pontus Stenetorp (UCL) and Deniz Yuret (Koç... Read more
Ripyr: Sampled Metrics on Datasets Using Python’s Asuncio
Today I’d like to introduce a little python library I’ve toyed around with here and there for the past year or so, ripyr. Originally it was written just as an excuse to try out some newer features in modern python: asyncio and type hinting. The whole package is type... Read more