Training + Business. Get your 2-for-1 deal to ODSC
East & CxO Summit before it expires on Friday.

This deal has timed out, but the next deal might just around the corner, or find a way to contact us about writing a blog and we'll talk. See you at ODSC East!

Use code: BUSINESS for an extra 20% Off

Erik Bernhardsson, Head of Engineering - One Zero Capital

Title :

Bio:

The eigenvector of “Why we moved from language X to language Y”

The eigenvector of “Why we moved from language X to languag...

I was reading yet another blog post titled “Why our team moved from to ” (I forgot which one) and I started wondering if you can generalize it a bit. Is it possible to generate a N * N contingency table of moving from language X to language Y? Someone should make a N*N contingency […]

Language pitch

Language pitch

Here’s a fun analysis that I did of the pitch (aka. frequency) of various languages. Certain languages are simply pronounced with lower or higher pitch. Whether this is a feature of the language or more a cultural thing is a good question, but there are some substantial differences between languages. Hertz (or Hz, or s−1s−1), […]

Vector Models in Machine learning Part 2

Vector Models in Machine learning Part 2

This is a blog post rewritten from a presentation at NYC Machine Learning on Sep 17. It covers a library called Annoy that I have built that helps you do nearest neighbor queries in high dimensional spaces. In the first part, I went through some examples of why vector models are useful. In the second […]

Nearest Neighbor Methods and Vector Models – part 1

Nearest Neighbor Methods and Vector Models – part 1

This is a blog post rewritten from a presentation at NYC Machine Learning. It covers a library called Annoy that I have built that helps you do (approximate) nearest neighbor queries in high dimensional spaces. I will be splitting it into several parts. This first talks about vector models, how to measure similarity, and why […]

The Half-life of Code & the Ship of Theseus

The Half-life of Code & the Ship of Theseus

As a project evolves, does the new code just add on top of the old code? Or does it replace the old code slowly over time? In order to understand this, I built a little thing to analyze Git projects, with help from the formidable GitPython project. The idea is to go back in history […]

Are Data Sets the New Server Rooms?

Are Data Sets the New Server Rooms?

This blog post Data sets are the new server rooms makes the point that a bunch of companies raise a ton of money to go get really proprietary awesome data as a competitive moat. Because once you have the data, you can build a better product, and no one can copy it (at least not […]

Pareto efficiency

Pareto efficiency

Pareto efficiency is a useful concept I like to think about. It often comes up when you compare items on multiple dimensions. Say you want to buy a new TV. To simplify it let’s assume you only care about two factors: price and quality. We don’t know what you are willing to pay for quality […]

When Machine Learning Matters

When Machine Learning Matters

I joined Spotify in 2008 to focus on machine learning and music recommendations. It’s easy to forget, but Spotify’s key differentiator back then was the low-latency playback. People would say that it felt like they had the music on their own hard drive. (The other key differentiator was licensing — until early 2009 Spotify basically […]

Approximate Nearest News

Approximate Nearest News

As you may know, one of my (very geeky) interests is Approximate nearest neigbormethods, and I’m the author of a Python package called Annoy. I’ve also built a benchmark suite called ann-benchmarks to compare different packages. Annoy was the world’s fastest package for a few months, but two things happened. FALCONN (FAst Lookups of Cosine […]