Plotting author statistics for Git repos using Git of Theseus
Data VisualizationData WranglingModelingToolsTools & LanguagesData Visualizationposted by Erik Bernhardsson January 17, 2018 Erik Bernhardsson
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I’ve written about it previously on this blog. The name is a horrible pun (I’m a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉
So anyway, here’s one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.
So what’s new? I’ve updated the color scheme a bit, but also added the option to plot author statistics:
And it doesn’t stop there! Here are some other minor updates:
- I published the whole thing to PyPI which also means that the installation is far simpler: just run
pip install git-of-theseus.
- The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run
git-of-theseus-analyzeon the command line.
- By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
- You can also normalize stats using the
--normalizeflag. See plot below:
That’s it! As I mentioned I got more where this came from. Some future blog posts will cover:
- ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
- convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
- champy which is a halfway implementation wrapper that lets you formulate and solve linear programming, mixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I’ve encountered. Don’t hold your breath for this one — it’s pretty far from being production-grade.
EDIT(2018-01-016): added a few more notes