# Third batch of notebooks for Think Stats

BlogResearchStatisticsposted by Allen Downey June 4, 2017 Allen Downey

*Think Stats*and updating the Jupyter notebooks. I am done with Chapters 1 through 9 now.

If you are reading the book, you can get the notebooks by cloning this repository on GitHub and running the notebooks on your computer. Or you can read (but not run) the notebooks on GitHub:

Chapter 7 Notebook (Chapter 7 Solutions)

Chapter 8 Notebook (Chapter 8 Solutions)

Chapter 9 Notebook (Chapter 9 Solutions)

I’ll post the next batch soon; in the meantime, here are some of the examples from Chapter 7, demonstrating the surprising difficulty of making an effective scatter plot, especially with large datasets (in this example, I use data from the Behavioral Risk Factor Surveillance System, which includes data from more than 300,000 respondents).

## Scatter plots

I’ll start with the data from the BRFSS again.

```
df = brfss.ReadBrfss(nrows=None)
```

`DataFrame`

.```
def SampleRows(df, nrows, replace=False):
indices = np.random.choice(df.index, nrows, replace=replace)
sample = df.loc[indices]
return sample
```

```
sample = SampleRows(df, 5000)
heights, weights = sample.htm3, sample.wtkg2
```

`alpha=1`

, so each data point is fully saturated.```
thinkplot.Scatter(heights, weights, alpha=1)
thinkplot.Config(xlabel='Height (cm)',
ylabel='Weight (kg)',
axis=[140, 210, 20, 200],
legend=False)
```