Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks. When I am done, each chapter will have a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.
If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.
Or you can read (but not run) the notebooks on GitHub:
I’ll post more soon, but in the meantime you can see some of the more interesting exercises, and solutions, below.
Use the NSFG respondent variable
numkdhhto construct the actual distribution for the number of children under 18 in the respondents’ households.
Now compute the biased distribution we would see if we surveyed the children and asked them how many children under 18 (including themselves) are in their household.
Plot the actual and biased distributions, and compute their means.
resp = nsfg.ReadFemResp()
# Solution pmf = thinkstats2.Pmf(resp.numkdhh, label='numkdhh')
# Solution thinkplot.Pmf(pmf) thinkplot.Config(xlabel='Number of children', ylabel='PMF')
# Solution biased = BiasPmf(pmf, label='biased')
# Solution thinkplot.PrePlot(2) thinkplot.Pmfs([pmf, biased]) thinkplot.Config(xlabel='Number of children', ylabel='PMF')
# Solution pmf.Mean()
# Solution biased.Mean()
To address this version of the question, select respondents who have at least live births and compute pairwise differences. Does this formulation of the question yield a different result?
live, firsts, others = first.MakeFrames()
preg_map = nsfg.MakePregMap(live)
# Solution hist = thinkstats2.Hist() for caseid, indices in preg_map.items(): if len(indices) >= 2: pair = preg.loc[indices[0:2]].prglngth diff = np.diff(pair) hist[diff] += 1