East & CxO Summit before it expires on Friday.

This deal has timed out, but the next deal might just around the corner, or find a way to contact us about writing a blog and we'll talk. See you at ODSC East!

Use code: BUSINESS for an extra 20% Off

# 1. Statistical Learning: The Setting and the Estimator Object in Scikit-learn

Tags: , ,

### 1.1. Datasets

The scikit-learn deals with learning information from one or more
datasets that are represented as 2D arrays. They can be understood as a
list of multi-dimensional observations. We say that the first axis of
these arrays is the samples axis, while the second is the
features axis.

A simple example shipped with the scikit: iris dataset

```>>> from scikits.learn import datasets
>>> data = iris.data
>>> data.shape
(150, 4)
```

It is made of 150 observations of irises, each described by 4
features: their sepal and petal length and width, as detailed in
iris.DESCR.

When the data is not intially in the (n_samples, n_features) shape, it
needs to be preprocessed to be used by the scikit.

An example of reshaping data: the digits dataset

The digits dataset is made of 1797 8×8 images of hand-written
digits

```>>> digits = datasets.load_digits()
>>> digits.images.shape
(1797, 8, 8)
>>> import pylab as pl
>>> pl.imshow(digits.images[0], cmap=pl.cm.gray_r)
<matplotlib.image.AxesImage object at ...>
```

To use this dataset with the scikit, we transform each 8×8 image in a
feature vector of length 64

```>>> data = digits.images.reshape((digits.images.shape[0], -1))
```

### 1.2. Estimators objects

Fitting data: The core object of the scikit-learn is the
estimator object. All estimator objects expose a fit method, that
takes a dataset (2D array):

```>>> estimator.fit(data)
```

Estimator parameters: All the parameters of an estimator can be set
when it is instanciated, or by modifying the corresponding attribute:

```>>> estimator = Estimator(param1=1, param2=2)
>>> estimator.param1
1
```

Estimated parameters: When data is fitted with an estimator,
parameters are estimated from the data at hand. All the estimated
parameters are attributes of the estimator object ending by an
underscore:

```>>> estimator.estimated_param_

```

Originally posted at gael-varoquaux.info

### Latest Posts

NYC Pre-K Explorer

04/26/2017