Bayesian models offer a method for making probabilistic predictions about the state of the world. Key advantages over a frequentist framework include the ability to incorporate prior information into the analysis, estimate missing values along with parameter values, and make statements about the probability of a certain hypothesis. The root of Bayesian magic is found in Bayes’ Theorem, describing the conditional probability of an event. This article is not a theoretical explanation of Bayesian statistics, but rather a step-by-step guide to building your first Bayesian model in R. If you are not familiar with the Bayesian framework, it is probably best to do some introductory reading before working through this article. To read more about Bayesian statistics, Mat Leonard offers some great insight here.
We’re going to model the theoretical price of a piece of hardware for a cell phone (something that is getting exponentially cheaper over time). I simulated some data using the exponential decay equation:
Pt is price at time t, $3,000 plus b0 is the starting price when t=0, Z is the rate of decline, and b0 is the asymptotic lowest price. I added noise to the relationship to make it more realistic; you can download the code and data for this exercise here.
Let’s start modeling. First, we’ll need the following packages. This model will be built using “rjags”, an R interface to JAGS (Just Another Gibbs Sampler) that supports Bayesian modeling. We’ll also use functions from R2OpenBugs and coda packages to support the model compiling and MCMC processes. MCMCvis will help us summarizing MCMC output.
We’re going to estimate b0 (the asymptote), and Z (the rate of decline), by setting up a model called “mod”. Because it’s a Bayesian model, we’ll have to come up with some priors that are independent of the data. In this instance, we don’t have any prior knowledge so we’ll use vague priors. We’ll define the vague priors using a normal distribution with a very large standard deviation. In R2OpenBugs (the package this model is written with), normal distributions are defined with “dnorm” given the arguments of mu and precision (1/variance). We made precision very small to emphasize the lack of prior information. We’ll also be estimating a measure of variation (precision) for the sampling error distribution “tau”. Because “tau” represents precision, it is modeled as 1/variance.
Next we’ll build the likelihood function. As a refresher, the likelihood is the probability of getting the data given a certain parameter value. We have three components to the likelihood in this model 1) the deterministic component estimating the parameter mu from our independent variable Time given the exponential decay equation with parameters Z and b0, 2) the stochastic component linking the response variable Price to mu given normally distributed sampling error, and 3) a component to track the price predicted by the model.
Next, we’ll write the model to OpenBugs and set hyperparameters for the MCMC process. We will be estimating the random variables tau, Z, b0, and Price_pred. You can adjust the hyperparameters (number of iterations, burn-in interval, thinning interval, etc.) to ensure model convergence. We won’t go into hyperparameter tuning here, but you can learn more here.
First, we’ll compile the model as a “jags.model”, then use “update” to iterate through the burn-in interval. Lastly, we’ll use “coda.samples” to get samples from the posterior distributions of our parameters using MCMC.
That concludes the MCMC process, we’ll now assess convergence and look at the results.
In the above code, we first produce a trace plot for the two parameters we’re most interested in, b0 and Z.
Posterior distributions look relatively smooth and the trace plot shows sufficient mixing among chains, all good signs for convergence! We’ll also use the Gelman & Rubin’s diagnostic to evaluate the degree of chain mixing; we want the multivariate potential scale reduction factor (PSRF) to be around 1.
From the “MCMCsummary” call, we can see the mean and credible interval bounds for the each of the posterior distributions we’re interested in.
With the information above, we are able to make statements like “there is a 95% probability that the parameter value for b0 is between $490 and $510”. The mean of the posterior distributions are right on the real values for b0 and Z (500 and .003 respectively). Great, we can then plot the model predictions and see how they relate to the observed values.
Not surprisingly, the model predicts price very well with no apparent pattern to the residuals.
This is a simple and limited introduction to Bayesian modeling. You can find the code and data for this exercise here.
Interesting in learning more about machine learning beyond building a Bayesian Model in R? Check out these Ai+ training sessions:
This first installment in the Machine Learning Foundations series the topic at the heart of most machine learning approaches. Through the combination of theory and interactive examples, you’ll develop an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces, thereby enabling machines to recognize patterns and make predictions.