

Another batch of Think Stats notebooks
BlogPythonStatisticsposted by Allen Downey June 15, 2017 Allen Downey

Getting ready to teach Data Science in the spring, I am going back through Think Stats and updating the Jupyter notebooks. When I am done, each chapter will have a notebook that shows the examples from the book along with some small exercises, with more substantial exercises at the end.
If you are reading the book, you can get the notebooks by cloning this repository on GitHub, and running the notebooks on your computer.
Or you can read (but not run) the notebooks on GitHub:
Chapter 10 Notebook (Chapter 10 Solutions)
Chapter 11 Notebook (Chapter 11 Solutions)
Chapter 12 Notebook (Chapter 12 Solutions)
I’ll post the last two soon, but in the meantime you can see some of the more interesting exercises, and solutions, below.
Time series analysis
Load the data from “Price of Weed”.
transactions = pd.read_csv('mj-clean.csv', parse_dates=[5])
transactions.head()
The following function takes a DataFrame of transactions and compute daily averages.
def GroupByDay(transactions, func=np.mean):
""Groups transactions by day and compute the daily mean ppg.
transactions: DataFrame of transactions
returns: DataFrame of daily prices
""
grouped = transactions[['date', 'ppg']].groupby('date')
daily = grouped.aggregate(func)
daily['date'] = daily.index
start = daily.date[0]
one_year = np.timedelta64(1, 'Y')
daily['years'] = (daily.date - start) / one_year
return daily
The following function returns a map from quality name to a DataFrame of daily averages.
def GroupByQualityAndDay(transactions):
""Divides transactions by quality and computes mean daily price.
transaction: DataFrame of transactions
returns: map from quality to time series of ppg
""
groups = transactions.groupby('quality')
dailies = {}
for name, group in groups:
dailies[name] = GroupByDay(group)
return dailies
dailies
is the map from quality name to DataFrame.
dailies = GroupByQualityAndDay(transactions)
The following plots the daily average price for each quality.
import matplotlib.pyplot as plt
thinkplot.PrePlot(rows=3)
for i, (name, daily) in enumerate(dailies.items()):
thinkplot.SubPlot(i+1)
title = 'Price per gram ($)' if i == 0 else ''
thinkplot.Config(ylim=[0, 20], title=title)
thinkplot.Scatter(daily.ppg, s=10, label=name)
if i == 2:
plt.xticks(rotation=30)
thinkplot.Config()
else:
thinkplot.Config(xticks=[])