Category Encoders V1.2.8 Release
Been a while since a release, but category encoders has continued to advance with the help of lots of great contributors. I’ve just released v1.2.8, with primarily bugfixes, as well as some new features allowing a user to optionally add the category names in the output column names of... Read more
Beyond Numpy Arrays in Python: Preparing the ecosystem for GPU, distributed, and sparse arrays
Executive Summary In recent years Python’s array computing ecosystem has grown organically to support GPUs, sparse, and distributed arrays. This is wonderful and a great example of the growth that can occur in decentralized open source development. However to solidify this growth and apply it across the ecosystem we... Read more
Intelligently Assisted Form Fields with Henosis
Filling Out Forms Isn’t Fun Online forms are the worst. The often-long, sometimes multi-page forms can be a time-consuming and laborious process to fill out. Almost any other task is more enjoyable, even with the occasional prize drawing or other form of incentive. While large forms can and often do... Read more
Predicting code bug risk with git metadata
One of the perks of working at Civis is the quarterly ‘Hack Time’. For one week each quarter, you get to explore an offbeat idea of your choice and then present the results to your colleagues. This past quarter I spent my time exploring some off-label uses for the... Read more
Dask Release 0.17.2
This work is supported by Anaconda Inc. and the Data Driven Discovery Initiative from the Moore Foundation. I’m pleased to announce the release of Dask version 0.17.2. This is a minor release with new features and stability improvements. This blogpost outlines notable changes since the 0.17.0 release on February 12th. You can... Read more
EXPLORATORY ANALYSIS – WHEN TO CHOOSE R, PYTHON, TABLEAU OR A COMBINATION
Not all data analysis tools are created equal. Recently, I started looking into data sets to compete in Go Code Colorado (check it out if you live in CO). The problem with such diversity in data sets is finding a way to quickly visualize the data and do exploratory analysis. While... Read more
Craft Minimal Bug Reports
Following up on a post on supporting users in open source this post lists some suggestions on how to ask a maintainer to help you with a problem. You don’t have to follow these suggestions. They are optional. They make it more likely that a project maintainer will spend time helping... Read more
This work is supported by Continuum Analytics, and the Data Driven Discovery Initiative from the Moore Foundation. This blogpost is about experimental software. The project may change or be abandoned without warning. You should not depend on anything within this blogpost. This week I built a small streaming library for Python. This was... Read more
WHAT PROGRAMMING LANGUAGES ARE USED MOST ON WEEKENDS?
Note: Cross-posted with the Stack Overflow blog. Check out the code for this analysis on Kaggle. For me, the weekends are mostly about spending time with my family, reading for leisure, and working on the open-source projects I am involved in. These weekend projects overlap with the work that I do... Read more
This blogpost is about topic modeling using data from this blog, opendatascience.com. From this, combined with the most visited articles of the year, we will generate the most popular topics of 2017. Last year, we did something similar with popular articles streamed through twitter using Non-Negative Matrix Factorization to determine topics, article... Read more
Open Data Science - Your News Source for AI, Machine Learning & more