# Detecting Outliers

PythonTools & Languagesposted by Damian Mingle August 2, 2018

In this context, outliers are data observations that are distant from other observations. There are a number of reasons why variability may exist...

In this context, outliers are data observations that are distant from other observations. There are a number of reasons why variability may exist in the data that you are working on during your analysis. Outliers may cause serious problems in your efforts as a Data Scientist.

title author date
Detecting Outliers
Damian Mingle
06/10/2018

### Preliminaries

```# Load libraries
import numpy as np
from sklearn.covariance import EllipticEnvelope
from sklearn.datasets import make_blobs```

### Create Data

```# Simulate data
simulated_data, _ = make_blobs(n_samples = 255,
n_features = 3,
centers = 1,
random_state = 1)

# Make extreme values
simulated_data[0,0] = 99999
simulated_data[0,1] = 99999```

### Detect Outliers

Using EllipticEnvelope forces you to specify a contaimination parameter (the proportition of outliers you think are in the data) – a significant limitation to this approach.

```# Build outlier detector
outlier_detector = EllipticEnvelope(contamination=.1)

# Fit outlier detector
outlier_detector.fit(simulated_data)

# Predict outliers
outlier_detector.predict(simulated_data)```
``````array([-1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1, -1,  1, -1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1, -1,  1,  1,  1, -1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  1,
1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,
1,  1, -1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1,
1,  1,  1, -1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,
1,  1,  1,  1,  1, -1, -1,  1,  1,  1,  1, -1,  1, -1,  1,  1,  1,
1, -1,  1,  1,  1,  1,  1,  1, -1,  1, -1,  1,  1,  1,  1,  1,  1,
1,  1,  1,  1,  1,  1,  1,  1,  1, -1,  1,  1,  1,  1,  1, -1,  1])
``````

If you are interested in learning about the manual method of detecting outliers look at this video:

Original Source

## Damian Mingle

Damian Mingle is an American businessman, investor, and data scientist. He is the Founder and Chief Data Scientist of LoveToThink.org, a way for skilled professionals to contribute their expertise and empower the world’s social changemakers. Formerly, Damian was the Chief Data Scientist at Intermedix (an R1 company) where he was responsible for leading a team of international data scientists to drive business value. As a leading authority on data science, Damian speaks nationally and internationally on patient safety, global health, and applied data science.

1