# Scikit-learn Tutorial: Statistical-Learning for Scientific Data Processing

**Zip file for off-line browsing:** https://github.com/GaelVaroquaux/scikit-learn-tutorial/zipball/gh-pages

Statistical learning

Machine learning is a technique with a growing importance, as the size of the datasets experimental sciences are facing is rapidly growing. Problems it tackles range from building a prediction function linking different observations, to classifying observations, or learning the structure in an unlabeled dataset.

This tutorial will explore statistical learning, that is the use of machine learning techniques with the goal of statistical inference: drawing conclusions on the data at hand.

`scikits.learn` is a Python module integrating classic machine learning algorithms in the tightly-knit world of scientific Python packages (numpy, scipy, matplotlib).

Note

This document is meant to be used with **scikit-learn version 0.7+**.

Warning

In scikit-learn release 0.9, the import path has changed from scikits.learn to sklearn. To import with cross-version compatibility, use:

```
try:
from sklearn import something
except ImportError:
from scikits.learn import something
```

- 1. Statistical learning: the setting and the estimator object in the Scikit-learn
- 1.1. Datasets
- 1.2. Estimators objects

- 2. Supervised learning: predicting an output variable from high-dimensional observations
- 2.1. Nearest neighbor and the curse of dimensionality
- 2.2. Linear model: from regression to sparsity
- 2.3. Support vector machines (SVMs)

- 3. Model selection: choosing estimators and their parameters
- 3.1. Score, and cross-validated scores
- 3.2. Cross-validation generators
- 3.3. Grid-search and cross-validated estimators

- 4. Unsupervised learning: seeking representations of the data
- 4.1. Clustering: grouping observations together
- 4.2. Decompositions: from a signal to components and loadings

- 5. Putting it all together
- 5.1. Pipelining
- 5.2. Face recognition with eigenfaces
- 5.3. Open problem: stock market structure

- 6. Finding help
- 6.1. The project mailing list
- 6.2. Q&A communities with Machine Learning practitioners

Originally posted at gael-varoquaux.info/