This ODSC West 2018 talk “Visualizing Vectors: Basics Every Data Scientist Should Know,” presented by Jed Crosby, Head of Data Science at Clari, should be a required learning resource for all new data scientists. This is because every data scientist should have a firm grasp of the mathematics behind the field, especially machine learning. Just take a moment to thumb through the “machine learning bible” Elements of Statistical Learning by Hastie, Tibshirani, and Friedman (3 luminary professors from Stanford University) and you’ll quickly understand what I mean. In order to read this “required text” for many graduate programs in machine learning, you’ll need to know a lot about linear algebra, and vectors constitute a big part of linear algebra.
Here’s a short-list of common data science techniques you may have heard of: PCA, LDA, SVD, SVM, k-NN, k-means. What do all of these (and many others) have in common? Vector mathematics! A lot of people, even data scientists, tend to shy away from mathematics, but fortunately, there are some simple methods for visualizing and understanding vectors that can make the life of any data scientist easier. If you’re new to the mathematical side of data science, this talk will arm you with some tools you can use to understand some of the basic mathematics behind many important algorithms.
The motivation for the talk was to help new and aspiring data scientists develop intuition about the mathematics behind some common algorithms. The talk is at a very basic level, aimed at those who have interest in or experience with data science but little or no experience with concepts from linear algebra.
- What is a vector?
- Using vector mathematics to build a recommender system (using Python’s cosine_similarity library)
- Two more algorithms that are all about vectors: k-means (unsupervised learning via vector sums and averages and also finding the distance between two vectors) and k-nearest neighbors (the laziest classification algorithm)
- Linear algebra: vectors and matrices, matrix multiplication, and Principal Component Analysis (PCA) for dimensionality reduction
The talk includes a description of a small set of movie preference data that might be gathered from users of a consumer entertainment app. Crosby then converts the data from each user into a preference vector and then explores the idea that greater “parallelness” between preference vectors indicates greater similarity between the movie preferences of the corresponding users. He also describes the basic geometric nature of a vector, exploring two intuitive ways of thinking about parallelness, and building up with an efficient way of computing the similarity between users. He uses the same technique to compute similarities between movies, demonstrating the effectiveness of vector representation and a similarity algorithm.
Data Science without the Math
Crosby sets the stage with why data scientists need to understand the mathematics under the hood of machine learning by explaining his experiences interviewing new data scientists for his group at Clari. He describes candidates who have some proficiency in plugging data into common algorithms, cranking through the various libraries (scikit-learn, TensorFlow, etc.) and getting good results, but who aren’t familiar with the basic underlying mathematics of what they’re working with. You can get far with the basic algorithms and libraries, but to develop intuition for what you’re doing, it helps a lot to understand the math.
To take a deeper dive into the mathematics behind machine learning, check out Jed Crosby’s compelling talk from ODSC West 2018.
- You can go far without knowing any of the underlying mathematics for machine learning, but to truly excel in the field you’ll need to acquire a foundation of vector algebra.
- You’ll also need to possess familiarity with mathematics, specifically linear algebra, to keep pace with all the research happening in the field (research papers are highly mathematical), not to mention all the great academic texts out there.
- You’ll see how the math works behind common algorithms like k-means clustering, and k-nearest neighbors for classification.
- You’ll see a brief demonstration of how vectors can be used to build a simple recommender system.