Bio: Algobeans is a blog by two data scientists, Annalyn Ng (University of Cambridge) and Kenneth Soo (Stanford University). Each tutorial covers the important functions and assumptions of a data science technique, without any math or jargon. They also illustrate these techniques with real-world data and examples.

### Principal Component Analysis Tutorial

The Problem Imagine that you are a nutritionist trying to explore the nutritional content of food. What is the best way to differentiate food items? By vitamin content? Protein levels? Or perhaps a combination of both? Knowing the variables that best differentiate your items has several uses: 1. Visualization. Using the right variables to plot […]

### Predicting Resignation in the Military

In the 2015 hackathon organized by Singapore’s Ministry of Defense, one of the tasks was to predict resignation rates in the military, using anonymized data on 23,000 personnel which included their age, military rank, years in service, as well as performance indicators such as salary increments and promotions. Our team won overall 3rd place. In this […]

### Artificial Neural Networks (ANN) Introduction

Training a Computer to Recognize your Handwriting Take a look at the picture below above and try to identify what it is. One should be able to tell that it is a giraffe, despite it being strangely fat. We recognize images and objects instantly, even if these images are presented in a form that is […]

### Decision Trees Tutorial

Would you survive a disaster? Certain groups of people, such as women and children, might be entitled to receiving help first, granting them a higher chance of survival. Knowing whether you belong to one of these privileged groups would help predict whether you would make it out alive. To identify which groups have higher survival rates, […]

### Topic Modeling with LDA Introduction

Suppose you have the following set of sentences: I eat fish and vegetables. Fish are pets. My kitten eats fish. Latent Dirichlet allocation (LDA) is a technique that automatically discovers topics that these documents contain. Given the above sentences, LDA might classify the red words under the Topic F, which we might label as “food“. Similarly, blue […]

### Time Series Analysis with Generalized Additive Models

Whenever you spot a trend plotted against time, you would be looking at a time series. The de facto choice for studying financial market performance and weather forecasts, time series are one of the most pervasive analysis techniques because of its inextricable relation to time—we are always interested to foretell the future. TEMPORAL DEPENDENT MODELS […]