# Support Vector Machines

**By: Gonzalo Garcia Berrotarán, Machinalis**

### Introduction

Support vector machines is one of the most popular methods of classification in machine learning although they can be used as a *black box*, understanding what’s happening behind scenes can be very useful not to mention interesting.

In an internal learning course, I decided to implement SVMs and my objective with this article to mention some of the difficulties encountered. If you’re planning to explore on how to implement support vector machines, have in mind this issues and the problem will be a little bit more easy to affront.

### Basic idea

To simplify explanations, all along this blog post we will consider an example dataset with 2 features and 2 classes. This way you can imagine your dataset on a cartesian plane:

In many machine learning algorithms, the main task during the training it’s the minimization or maximization of a function. In this case we want to find a line that divides both classes and has **maximum** distance between the points of one class and the other.

**Training**

I won’t get too deep into the training process, but basically the formula of the width of the margin it’s taken into a function of some Lagrange multipliers.

Solving this gives you some non-zero multipliers applied to some of the training vectors (also called the support vectors). This is all you need to store to be able to classify, all the math on the classification is done using this.

**Classification**

To classify an unknown example, you decide which class belongs to based on which side of the divisor line falls.

**Maximization or minimization?**

The first trouble I faced was that the literature sometimes expresses this as a maximization problem and sometimes as a minimization problem. There are a lot of algorithms to find the minimum of a function so if you have to find a maximum, the trick is to find the minimum on the inverse function. In the theory, as we’ve seen, this is a maximization problem but to simplify things some times is presented directly with the formula to minimize so it is important to have this in mind when doing your own implementation.

**Ok, I got it, but how to find this minimum?**

Once you understand what you have to minimize, you have to think on which method use. Many minimization methods needs you to find the derivative (or second derivative) of the function. But this particular problem, can be put in terms of a quadratic programming problem. This means you can use a minimization solver for this kinds of problems such as cvxopt.

**Not all problems are like that**

You might be thinking that not all the problems have such data distribution that can be linearly separable and it’s true, but SVMs have a trick to work with such datasets so the main idea remains: when stuck, switch to another perspective.

### Other sources

- Of course, if you need to use a professional python implementation, take at look at scikit-learn
- There’s a great class of the MIT Artificial Intelligence class available on their youtube channel
- A great blogpost about support vector machines implementation on python

Originally posted at http://www.machinalis.com/blog/