### Gaussian Naive Bayes...

Building Gaussian Naive Bayes Classifier in Python In this post, we are going to implement the Naive Bayes classifier in Python using my favorite machine learning library scikit-learn. Next, we are going to use the trained Naive Bayes (supervised classification), model to predict the Census Income. As we discussed the Bayes theorem in naive Bayes classifier […]

### Here’s What Tw...

The Patriots 34-28 win in Super Bowl 51 was, quite possibly, one of greatest football games of all time. It had the largest Super Bowl comeback of all time and was the first to ever to go to overtime. For data-minded folks, the game exhibited striking parallels to the election. ESPN’s live prediction models was saying that Atlanta was almost certain to […]

### Implementing a Princ...

Sections Sections Introduction Principal Component Analysis (PCA) Vs. Multiple Discriminant Analysis (MDA) What is a “good” subspace? Summarizing the PCA approach Generating some 3-dimensional sample data Why are we chosing a 3-dimensional sample? 1. Taking the whole dataset ignoring the class labels 2. Computing the d-dimensional mean vector 3. a) Computing the Scatter Matrix 3. […]

### A Budget of Classifi...

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” early in a project (when you […]

### Beyond One-Hot: An E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

### Better Python Compre...

Problem setting: persistence for big data Joblib is a powerful Python package for management of computation: parallel computing, caching, and primitives for out-of-core computing. It is handy when working on so called big data, that can consume more than the available RAM (several GB nowadays). In such situations, objects in the working space must be […]

### Beyond One-hot: an E...

In machine learning, data is king. The algorithms and models used to make predictions with the data are important, and very interesting, but ML is still subject to the idea of garbage-in-garbage-out. With that in mind, let’s look at a little subset of those input data: categorical variables. Categorical variables (wiki) are those that represent a […]

### Diving Deep into Pyt...

Sections Sections The C3 class resolution algorithm for multiple class inheritance Assignment operators and lists – simple-add vs. add-AND operators True and False in the datetime module Python reuses objects for small integers – use “==” for equality, “is” for identity And to illustrate the test for equality (==) vs. identity (is): Shallow vs. deep […]

### Introduction to Pyth...

I’ve been trying to learn how to program since I was ten years old. I tried many times – mostly because my dad is a developer and wanted to share the thing he loves – but Java, C, and C++ always looked scary. I couldn’t really get into it. There was too much I had […]