# New notebooks for Think Stats

ModelingPythonStatisticsTools & LanguagesStatisticsposted by Allen Downey April 22, 2017 Allen Downey

Getting ready to teach Data Science in the spring, I am going back through *Think Stats* and updating the Jupyter notebooks. When I am done, each chapter will have a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.

If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.

Or you can read (but not run) the notebooks on GitHub:

Chapter 1 Notebook (Chapter 1 Solutions)

Chapter 2 Notebook (Chapter 2 Solutions)

Chapter 3 Notebook (Chapter 3 Solutions)

I’ll post more soon, but in the meantime you can see some of the more interesting exercises, and solutions, below.

**Exercise:**Something like the class size paradox appears if you survey children and ask how many children are in their family. Families with many children are more likely to appear in your sample, and families with no children have no chance to be in the sample.

Use the NSFG respondent variable

`numkdhh`

to construct the actual distribution for the number of children under 18 in the respondents’ households.Now compute the biased distribution we would see if we surveyed the children and asked them how many children under 18 (including themselves) are in their household.

Plot the actual and biased distributions, and compute their means.

```
resp = nsfg.ReadFemResp()
```

```
# Solution
pmf = thinkstats2.Pmf(resp.numkdhh, label='numkdhh')
```

```
# Solution
thinkplot.Pmf(pmf)
thinkplot.Config(xlabel='Number of children', ylabel='PMF')
```

```
# Solution
biased = BiasPmf(pmf, label='biased')
```

```
# Solution
thinkplot.PrePlot(2)
thinkplot.Pmfs([pmf, biased])
thinkplot.Config(xlabel='Number of children', ylabel='PMF')
```

```
# Solution
pmf.Mean()
```

```
# Solution
biased.Mean()
```

**Exercise:**I started this book with the question, “Are first babies more likely to be late?” To address it, I computed the difference in means between groups of babies, but I ignored the possibility that there might be a difference between first babies and others for the same woman.

To address this version of the question, select respondents who have at least live births and compute pairwise differences. Does this formulation of the question yield a different result?

Hint: use

`nsfg.MakePregMap`

:```
live, firsts, others = first.MakeFrames()
```

```
preg_map = nsfg.MakePregMap(live)
```

```
# Solution
hist = thinkstats2.Hist()
for caseid, indices in preg_map.items():
if len(indices) >= 2:
pair = preg.loc[indices[0:2]].prglngth
diff = np.diff(pair)[0]
hist[diff] += 1
```

```
# Solution
thinkplot.Hist(hist)
```