Plotting author statistics for Git repos using Git of Theseus Plotting author statistics for Git repos using Git of Theseus
I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog... Plotting author statistics for Git repos using Git of Theseus

I spent a few days during the holidays fixing up a bunch of semi-dormant open source projects and I have a couple of blog posts in the pipeline about various updates. First up, I made a number of fixes to Git of Theseus which is a tool (written in Python) that generates statistics about Git repositories. I’ve written about it previously on this blog. The name is a horrible pun (I’m a dad!) on Ship of Theseus which is a philosophical thought experiment about what happens if you replace every single part of a boat — is it still the same boat ⁉

So anyway, here’s one of the plots you can generate for Kubernetes — a somewhat arbitrarily picked repository.

k8s git

So what’s new? I’ve updated the color scheme a bit, but also added the option to plot author statistics:

k8s git

And it doesn’t stop there! Here are some other minor updates:

  • I published the whole thing to PyPI which also means that the installation is far simpler: just run pip install git-of-theseus.
  • The pip package also installs binaries that lets you run the analyses in a more straightforward way: just run git-of-theseus-analyze on the command line.
  • By default it now only analyzes file types of certain extensions that indicate source code (by leveraging pygments)
  • You can also normalize stats using the --normalize flag. See plot below:

git git

That’s it! As I mentioned I got more where this came from. Some future blog posts will cover:

  • ann-benchmarks which is a tool to benchmark approximate nearest neighbor methods. Very niche, but very useful within its niche. I just spent a lot of time precomputing datasets and Dockerizing all algorithms.
  • convoys a new tool I built to model and plot time-lagged conversion. Fun stuff with Gamma and Weibull distributions.
  • champy which is a halfway implementation wrapper that lets you formulate and solve linear programmingmixed integer programming, and constraint programming problems in a much nicer way (IMO) than any other library I’ve encountered. Don’t hold your breath for this one — it’s pretty far from being production-grade.

EDIT(2018-01-016): added a few more notes

 

Original Source.

Erik Bernhardsson

Erik Bernhardsson

I like to work with smart people and deliver great software. After 5+ years at Spotify, I just left for new exciting startup in NYC where I am leading the engineering team. We're hiring like crazy – if you're a serial polyglot and like to build something big from scratch – drop me an email at erik@better.com! At Spotify, I built up and lead the team responsible for music recommendations and machine learning. We designed and built many large scale machine learning algorithms we use to power the recommendation features: the radio feature, the "Discover"​ page, "Related Artists"​, and much more. I also authored Luigi, which is a workflow manager in Python with 3,000+ stars on Github – used by Foursquare, Quora, Stripe, Asana, etc.