Alexandru Agachi of Empiric Capital on “Handling Missing Data in Python/Pandas” at ODSC Europe 2018
Key Takeaways: It’s important to describe missing data and the challenges it poses. You need to clarify a confusing terminology that further adds to the field’s complexity. You should take the time to review methods for handling missing data. You need to learn how to apply robust multiple imputation... Read more
All the Best Parts of Pandas for Data Science
Pandas has been hailed by many in the data science community as the missing link between Python and analysis, a tool that can be leveraged in order to dramatically reduce overhead in data science projects, increase understandability and speed up workflows.   Pandas comes loaded with a wide range... Read more
Git-Pandas caching for Faster Analysis
Git-pandas is a python library I wrote to help make analysis of git data easier when dealing with collections of repositories.  It makes a ton of cool stuff easier, like cumulative blame plots, but they can be kind of slow, especially with many large repositories. In the past we’ve made that... Read more