Exploring Scikit-Learn Further: The Bells and Whistles of Preprocessing
In my previous post, we constructed a simple cross-validated regression model using Scikit-Learn in 35 lines. It’s pretty amazing that we can perform machine learning with so little effort, but we just did the bare minimum in order to get a working model. Frankly, it didn’t even perform that well.... Read more
All the Best Parts of Pandas for Data Science
Pandas has been hailed by many in the data science community as the missing link between Python and analysis, a tool that can be leveraged in order to dramatically reduce overhead in data science projects, increase understandability and speed up workflows.   Pandas comes loaded with a wide range... Read more
ODSC Europe 2018 – Open Source Data Science Project Award Winner: PyMC3
Thanks to the efforts of academia, AI labs, and others, significant progress continues to be made in deep learning, machine learning, and data science in general. However, it’s thanks to the open source projects that many of these advances are quickly accessible to data scientist and developers. As such, we... Read more
Snakes in a Package: Combining Python and R with Reticulate
When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating; my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to... Read more
Ripyr: Sampled Metrics on Datasets Using Python’s Asuncio
Today I’d like to introduce a little python library I’ve toyed around with here and there for the past year or so, ripyr. Originally it was written just as an excuse to try out some newer features in modern python: asyncio and type hinting. The whole package is type... Read more
In part two of my XKCD font saga I was able to separate strokes from the XKCD handwriting dataset into many smaller images. I also handled the easier cases of merging some of the strokes back together – I particularly focussed on “dotty” or “liney” type glyphs, such as... Read more
In part one of XKCD font saga I gave some background on the XKCD handwriting dataset, and took an initial look at image segmentation in order to extract the individual strokes from the scanned image. In this installment, I will apply the technique from part 1, as well as... Read more
Python as a way of thinking
This article contains supporting material for this blog post at Scientific American.  The thesis of the post is that modern programming languages (like Python) are qualitatively different from the first generation (like FORTRAN and C), in ways that make them effective tools for teaching, learning, exploring, and thinking. I presented... Read more