Allen Downey, Professor of Computer Science at Olin College of Engineering, has built much of his career around Bayesian methods and the Python programming language. This in-demand skillset has been garnering increasing attention in the data science field, boosting its use in businesses and becoming more in-demand as a job qualification. Allen recently took the time to answer some questions via email about the importance of Bayesian methods, why it’s becoming more popular for business use, and why PyMC is a go-to tool.
Q: Why did you choose to go with Bayesian methods as your specialty? What sets Bayesian methods apart from other approaches in data science?
Bayesian methods are powerful and applicable to a wide range of problems. But they are also beautiful in a way other methods are not. I mean, other machine learning methods are also powerful and versatile, but they often seem ad hoc. And classical statistical methods, like p-values and confidence intervals, are just a hopeless muddle.
The power of Bayesian methods is that they express uncertainty in the form of probabilities, which makes them especially suitable for decision analysis. If you have to make life and death decisions under uncertainty, Bayesian methods are the way to go.
Q: What makes PyMC special as a library for Bayesian inference?
PyMC provides two things you need for Bayesian inference: a language for specifying a statistical model and algorithms for sampling the posterior distribution. And it does both things well; specifying the model is about as simple as it could be, and the samplers are good implementations of our best current algorithms.
Even so, MCMC samplers can be finicky, so you need good tools for diagnosing and fixing problems. PyMC generates helpful warnings and suggestions, and the companion library ArviZ provides tools for visualizing and checking results.
Q: How can knowing Bayesian methods help an aspiring data scientist get a job? Would you consider this an in-demand skill to have?
There’s a lot of demand for people who can solve problems, but often a company that needs Bayesian methods doesn’t know it. They know what problem they are trying to solve, not how to solve it. So I think it’s better to be a general problem solver, with Bayesian methods as one of the tools on your belt, rather than someone with just one hammer, looking for nails.
Q: What interesting use cases have you seen for Bayesian methods?
One of the most compelling examples is using Thompson sampling for medical trials. It’s an important application — often literally life and death — and there are clear shortcomings in conventional approaches. It is also remarkably simple and elegant. To demonstrate the point, I designed a game that implements a Bayesian clinical trial using dice. If you want to try it out, the rules are at https://allendowney.github.io/TheShakes/.
Learn more about Bayesian methods with Ai+ Training
On August 17th, Allen will be instructing more on Bayesian methods with PyMC on our Ai+ Training platform. His session, Bayesian Inference with PyMC, uses PyMC to estimate proportions and rates on unique use cases, and uses those estimates to generate predictions. These methods have applications in business, science, and engineering.
Allen Downey is a Professor of Computer Science at Olin College of Engineering in Needham, MA. He is the author of several books related to computer science and data science, including Think Python, Think Stats, Think Bayes, and Think Complexity. Prof Downey has taught at Colby College and Wellesley College, and in 2009 he was a Visiting Scientist at Google. He received his Ph.D. in Computer Science from U.C. Berkeley, and M.S. and B.S. degrees from MIT.