fbpx
Python as a way of thinking Python as a way of thinking
This article contains supporting material for this blog post at Scientific American.  The thesis of the post is that modern programming languages... Python as a way of thinking

This article contains supporting material for this blog post at Scientific American.  The thesis of the post is that modern programming languages (like Python) are qualitatively different from the first generation (like FORTRAN and C), in ways that make them effective tools for teaching, learning, exploring, and thinking.

I presented a longer version of this argument in a talk I presented at Olin College last fall.  The slides are here:

Here are Jupyter notebooks with the code examples I mentioned in the talk:

Here’s my presentation at SciPy 2015, where I talked more about Python as a way of teaching and learning DSP:
Finally, here’s the notebook “Using Counters”, which uses Python’s Counter object to implement a PMF (probability mass function) and perform Bayesian updates.
In [13]:
from __future__ import print_function, division

from collections import Counter
import numpy as np
A counter is a map from values to their frequencies. If you initialize a counter with a string, you get a map from each letter to the number of times it appears. If two words are anagrams, they yield equal Counters, so you can use Counters to test anagrams in linear time.
In [3]:
def is_anagram(word1, word2):
    ""Checks whether the words are anagrams.

    word1: string
    word2: string

    returns: boolean
    ""
    return Counter(word1) == Counter(word2)
In [4]:
is_anagram('tachymetric', 'mccarthyite')
Out[4]:
True
In [5]:
is_anagram('banana', 'peach')
Out[5]:
False
Multisets
A Counter is a natural representation of a multiset, which is a set where the elements can appear more than once. You can extend Counter with set operations like is_subset:
In [6]:
class Multiset(Counter):
    ""A multiset is a set where elements can appear more than once.""

    def is_subset(self, other):
        ""Checks whether self is a subset of other.

        other: Multiset

        returns: boolean
        ""
        for char, count in self.items():
            if other[char] < count:
                return False
        return True
    
    # map the <= operator to is_subset
    __le__ = is_subset
You could use is_subset in a game like Scrabble to see if a given set of tiles can be used to spell a given word.
In [7]:
def can_spell(word, tiles):
    ""Checks whether a set of tiles can spell a word.

    word: string
    tiles: string

    returns: boolean
    ""
    return Multiset(word) <= Multiset(tiles)
In [8]:
can_spell('SYZYGY', 'AGSYYYZ')
Out[8]:
True

Probability Mass Functions

You can also extend Counter to represent a probability mass function (PMF).
normalize computes the total of the frequencies and divides through, yielding probabilities that add to 1.
__add__ enumerates all pairs of value and returns a new Pmf that represents the distribution of the sum.
__hash__ and __id__ make Pmfs hashable; this is not the best way to do it, because they are mutable. So this implementation comes with a warning that if you use a Pmf as a key, you should not modify it. A better alternative would be to define a frozen Pmf.
render returns the values and probabilities in a form ready for plotting

In [9]:
class Pmf(Counter):
    ""A Counter with probabilities.""

    def normalize(self):
        ""Normalizes the PMF so the probabilities add to 1.""
        total = float(sum(self.values()))
        for key in self:
            self[key] /= total

    def __add__(self, other):
        ""Adds two distributions.

        The result is the distribution of sums of values from the
        two distributions.

        other: Pmf

        returns: new Pmf
        ""
        pmf = Pmf()
        for key1, prob1 in self.items():
            for key2, prob2 in other.items():
                pmf[key1 + key2] += prob1 * prob2
        return pmf

    def __hash__(self):
        ""Returns an integer hash value.""
        return id(self)
    
    def __eq__(self, other):
        return self is other

    def render(self):
        ""Returns values and their probabilities, suitable for plotting.""
        return zip(*sorted(self.items()))
As an example, we can make a Pmf object that represents a 6-sided die.
In [10]:
d6 = Pmf([1,2,3,4,5,6])
d6.normalize()
d6.name = 'one die'
print(d6)
Pmf({1: 0.16666666666666666, 2: 0.16666666666666666, 3: 0.16666666666666666, 4: 0.16666666666666666, 5: 0.16666666666666666, 6: 0.16666666666666666})
Using the add operator, we can compute the distribution for the sum of two dice.
In [11]:
d6_twice = d6 + d6
d6_twice.name = 'two dice'

for key, prob in d6_twice.items():
    print(key, prob)
2 0.0277777777778
3 0.0555555555556
4 0.0833333333333
5 0.111111111111
6 0.138888888889
7 0.166666666667
8 0.138888888889
9 0.111111111111
10 0.0833333333333
11 0.0555555555556
12 0.0277777777778
Using numpy.sum, we can compute the distribution for the sum of three dice.
In [14]:
# if we use the built-in sum we have to provide a Pmf additive identity value
# pmf_ident = Pmf([0])
# d6_thrice = sum([d6]*3, pmf_ident)

# with np.sum, we don't need an identity
d6_thrice = np.sum([d6, d6, d6])
d6_thrice.name = 'three dice'
And then plot the results (using Pmf.render)
In [19]:
import matplotlib.pyplot as plt
%matplotlib inline
In [20]:
for die in [d6, d6_twice, d6_thrice]:
    xs, ys = die.render()
    plt.plot(xs, ys, label=die.name, linewidth=3, alpha=0.5)
    
plt.xlabel('Total')
plt.ylabel('Probability')
plt.legend()
plt.show()
Allen Downey

Allen Downey

I am a Professor of Computer Science at Olin College in Needham MA, and the author of Think Python, Think Bayes, Think Stats and several other books related to computer science and data science. Previously I taught at Wellesley College and Colby College, and in 2009 I was a Visiting Scientist at Google, Inc. I have a Ph.D. from U.C. Berkeley and B.S. and M.S. degrees from MIT. Here is my CV. I write a blog about Bayesian statistics and related topics called Probably Overthinking It. Several of my books are published by O’Reilly Media and all are available under free licenses from Green Tea Press.

1