Hierarchical approaches to statistical modeling are integral to a data scientist’s skill set because hierarchical data is incredibly common. In this article, we’ll go through the advantages of employing hierarchical Bayesian models and go through an exercise building one in R. If you’re unfamiliar with Bayesian modeling, I recommend following Brandon Rohrer’s (Principal Data Scientist at IRobot) explanation expressed here, and an introduction to building Bayesian models in R here.
[Related Article: Why Do Tree Ensembles Work?]
We commonly encounter data that is organized in a hierarchical manner. A simple toy example would be SAT scores for students in different classes, belonging to various schools. The academic rigor and teaching prowess likely differ by class and schools, so you could imagine the mean exam score will differ by class and by school.
Perhaps we’re interested in how healthy eating and exercise impact SAT scores, and we model those variables with respect to the student’s scores. If we aggregate all the scores and fit a single linear model, we treat each student as an independent sample. Because the students are grouped together, students in the same class have similar experiences; they are not independent. This causes a few problems, 1) model errors will be correlated within groups, 2) error variance will be different within different groups, and 3) the impact of independent variables on exam score will differ based on group. We could model each class with separate models and satisfy the requirement for independent samples, but we would be throwing away useful information by isolating each model from each other. A better alternative is to employ a hierarchical model structure to the hierarchical data. The added complexity will support greater statistical power than isolated models by allowing us to use the entire dataset.
We’re going to walk through building and fitting a hierarchical model by extending the example used previously in “Building Your First Bayesian Model in R”, an article you can find here. If you’re unfamiliar with building Bayesian models using RJAGS, it may be useful to review the article. We’re going to model the imaginary declining price of solar panels over time (please note, the purpose is not to do any forecasting, simply learn to build a model and estimate parameters). Follow along with the code available on github here.The dataset is simple; we have the selling price a solar panel at a given time, in a certain country.
Like any technology, price declines over time in a way that can be modeled using the exponential decay function:
where Pt is price at time t, $3,000 plus b0 is the starting price, Z is the rate of decline, and b0 is the asymptotic lowest price. In this example, we’ll assume that the rate of decline in the price of solar panels differs by country due to regulations, availability of parts, shipping, etc. So, within the plot above, there are multiple groups of samples that are interdependent given country. Instead of using the above equation, we’ll add complexity to allow Z, the rate of decline, to vary with respect to country j.
When we write our model in RJAGs, we’ll follow the usual process of defining our priors, but when we write our likelihood, we’ll model Z , the rate of decline, as a random variable with respect to Country.
We surmise that the rate of price decline is normally distributed, so we’ll model Z for each country with normally distributed errors. Note that the for loop associated with modeling Z iterates from 1 to 8, for the 8 countries. The distribution of Z values is characterized by a mean of “mu.Z” and a standard deviation of “tau.g”.
The following lines compile and fit the model using MCMC. For a more detailed explanation of what’s going on, reference the first Bayesian article here.
Observing the trace plots below show the three chains mixing well and smooth posterior distributions, good signs for convergence. Feel free to use gelman’s diagnostic or other diagnostic to confirm convergence.
When I simulated the data, the true Z values for each country had a mean of .0026 and a standard deviation of .0007. The model predicted values very similar!
Now let’s see how well the prices predicted by the model match with those observed.
[Related Article: The Empirical Derivation of the Bayesian Formula]
The model did pretty well! We covered a simple example of modeling hierarchical data with a hierarchical Bayesian model. Hopefully, this example can serve as a useful template for further models. The code for this exercise is available on github here.