

New notebooks for Think Stats
ModelingPythonStatisticsTools & LanguagesStatisticsposted by Allen Downey April 22, 2017 Allen Downey

Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks. When I am done, each chapter will have a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.
If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.
Or you can read (but not run) the notebooks on GitHub:
Chapter 1 Notebook (Chapter 1 Solutions)
Chapter 2 Notebook (Chapter 2 Solutions)
Chapter 3 Notebook (Chapter 3 Solutions)
I’ll post more soon, but in the meantime you can see some of the more interesting exercises, and solutions, below.
Use the NSFG respondent variable
numkdhh
to construct the actual distribution for the number of children under 18 in the respondents’ households.Now compute the biased distribution we would see if we surveyed the children and asked them how many children under 18 (including themselves) are in their household.
Plot the actual and biased distributions, and compute their means.
resp = nsfg.ReadFemResp()
# Solution
pmf = thinkstats2.Pmf(resp.numkdhh, label='numkdhh')
# Solution
thinkplot.Pmf(pmf)
thinkplot.Config(xlabel='Number of children', ylabel='PMF')
# Solution
biased = BiasPmf(pmf, label='biased')
# Solution
thinkplot.PrePlot(2)
thinkplot.Pmfs([pmf, biased])
thinkplot.Config(xlabel='Number of children', ylabel='PMF')
# Solution
pmf.Mean()
# Solution
biased.Mean()
To address this version of the question, select respondents who have at least live births and compute pairwise differences. Does this formulation of the question yield a different result?
Hint: use
nsfg.MakePregMap
:live, firsts, others = first.MakeFrames()
preg_map = nsfg.MakePregMap(live)
# Solution
hist = thinkstats2.Hist()
for caseid, indices in preg_map.items():
if len(indices) >= 2:
pair = preg.loc[indices[0:2]].prglngth
diff = np.diff(pair)[0]
hist[diff] += 1
# Solution
thinkplot.Hist(hist)