

Python as a way of thinking
BlogPythonPythonposted by Allen Downey May 21, 2017 Allen Downey

This article contains supporting material for this blog post at Scientific American. The thesis of the post is that modern programming languages (like Python) are qualitatively different from the first generation (like FORTRAN and C), in ways that make them effective tools for teaching, learning, exploring, and thinking.
I presented a longer version of this argument in a talk I presented at Olin College last fall. The slides are here:
Here are Jupyter notebooks with the code examples I mentioned in the talk:
- Breadth-first search in Python
- Using Counters, including the Bayesian update example.
- Introduction to PMFs, including the anagram example.
- Vectors, Frames, and Transforms.
- Cacophony for the Whole Family, an example from Think DSP.
from __future__ import print_function, division
from collections import Counter
import numpy as np
def is_anagram(word1, word2):
""Checks whether the words are anagrams.
word1: string
word2: string
returns: boolean
""
return Counter(word1) == Counter(word2)
is_anagram('tachymetric', 'mccarthyite')
is_anagram('banana', 'peach')
A Counter is a natural representation of a multiset, which is a set where the elements can appear more than once. You can extend Counter with set operations like is_subset:
class Multiset(Counter):
""A multiset is a set where elements can appear more than once.""
def is_subset(self, other):
""Checks whether self is a subset of other.
other: Multiset
returns: boolean
""
for char, count in self.items():
if other[char] < count:
return False
return True
# map the <= operator to is_subset
__le__ = is_subset
def can_spell(word, tiles):
""Checks whether a set of tiles can spell a word.
word: string
tiles: string
returns: boolean
""
return Multiset(word) <= Multiset(tiles)
can_spell('SYZYGY', 'AGSYYYZ')
Probability Mass Functions
You can also extend Counter to represent a probability mass function (PMF).
normalize
computes the total of the frequencies and divides through, yielding probabilities that add to 1.
__add__
enumerates all pairs of value and returns a new Pmf that represents the distribution of the sum.
__hash__
and __id__
make Pmfs hashable; this is not the best way to do it, because they are mutable. So this implementation comes with a warning that if you use a Pmf as a key, you should not modify it. A better alternative would be to define a frozen Pmf.
render
returns the values and probabilities in a form ready for plotting
class Pmf(Counter):
""A Counter with probabilities.""
def normalize(self):
""Normalizes the PMF so the probabilities add to 1.""
total = float(sum(self.values()))
for key in self:
self[key] /= total
def __add__(self, other):
""Adds two distributions.
The result is the distribution of sums of values from the
two distributions.
other: Pmf
returns: new Pmf
""
pmf = Pmf()
for key1, prob1 in self.items():
for key2, prob2 in other.items():
pmf[key1 + key2] += prob1 * prob2
return pmf
def __hash__(self):
""Returns an integer hash value.""
return id(self)
def __eq__(self, other):
return self is other
def render(self):
""Returns values and their probabilities, suitable for plotting.""
return zip(*sorted(self.items()))
d6 = Pmf([1,2,3,4,5,6])
d6.normalize()
d6.name = 'one die'
print(d6)
d6_twice = d6 + d6
d6_twice.name = 'two dice'
for key, prob in d6_twice.items():
print(key, prob)
# if we use the built-in sum we have to provide a Pmf additive identity value
# pmf_ident = Pmf([0])
# d6_thrice = sum([d6]*3, pmf_ident)
# with np.sum, we don't need an identity
d6_thrice = np.sum([d6, d6, d6])
d6_thrice.name = 'three dice'
import matplotlib.pyplot as plt
%matplotlib inline
for die in [d6, d6_twice, d6_thrice]:
xs, ys = die.render()
plt.plot(xs, ys, label=die.name, linewidth=3, alpha=0.5)
plt.xlabel('Total')
plt.ylabel('Probability')
plt.legend()
plt.show()