The Beginners Guide for Video Processing with OpenCV
Computer vision is a huge part of the data science/AI domain. Sometimes, computer vision engineers have to deal with videos. Here, we aim to shed light on video processing – using Python, of course. This might be obvious for some, but nevertheless, video streaming is not a continuous process,... Read more
Which Conference is Best? — College Hoops, Net Rankings and Python
For college basketball junkies like me, the season is now shifting into high gear as teams begin serious conference play. At the end of the regular season and conference tournaments, 66 D1 teams — 32 league champions and 34 at large — will receive invitations to March’s national championship... Read more
Handling Missing Data in Python/Pandas
Key Takeaways: It’s important to describe missing data and the challenges it poses. You need to clarify a confusing terminology that further adds to the field’s complexity. You should take the time to review methods for handling missing data. You need to learn how to apply robust multiple imputation... Read more
Exploring Scikit-Learn Further: The Bells and Whistles of Preprocessing
In my previous post, we constructed a simple cross-validated regression model using Scikit-Learn in 35 lines. It’s pretty amazing that we can perform machine learning with so little effort, but we just did the bare minimum in order to get a working model. Frankly, it didn’t even perform that well.... Read more
All the Best Parts of Pandas for Data Science
Pandas has been hailed by many in the data science community as the missing link between Python and analysis, a tool that can be leveraged in order to dramatically reduce overhead in data science projects, increase understandability and speed up workflows.   Pandas comes loaded with a wide range... Read more
ODSC Europe 2018 – Open Source Data Science Project Award Winner: PyMC3
Thanks to the efforts of academia, AI labs, and others, significant progress continues to be made in deep learning, machine learning, and data science in general. However, it’s thanks to the open source projects that many of these advances are quickly accessible to data scientist and developers. As such, we... Read more
Snakes in a Package: Combining Python and R with Reticulate
When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating; my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to... Read more
Ripyr: Sampled Metrics on Datasets Using Python’s Asuncio
Today I’d like to introduce a little python library I’ve toyed around with here and there for the past year or so, ripyr. Originally it was written just as an excuse to try out some newer features in modern python: asyncio and type hinting. The whole package is type... Read more
In part two of my XKCD font saga I was able to separate strokes from the XKCD handwriting dataset into many smaller images. I also handled the easier cases of merging some of the strokes back together – I particularly focussed on “dotty” or “liney” type glyphs, such as... Read more
In part one of XKCD font saga I gave some background on the XKCD handwriting dataset, and took an initial look at image segmentation in order to extract the individual strokes from the scanned image. In this installment, I will apply the technique from part 1, as well as... Read more