# What’s the healthiest city in the US? (and what does that even mean?)

ModelingPredictive AnalyticsStatisticsData analysis|Healthcare|Healthcare Data Scienceposted by Marcelo Rinesi February 22, 2018

(Spoiler alert: it’s not Detroit, and it’d be a relatively simple question if it weren’t for cancer.) The human body has...

(Spoiler alert: it’s not Detroit, and it’d be a relatively simple question if it weren’t for cancer.)

The human body has multiple ways of not working well; at its broadest, the CDC’s 500 Cities project data set lists fourteen different health outcomes, from arthritis, to stroke, to the wonderfully phrased mental health not good. But, just as for an individual having a certain condition raises the probability of others (e.g., high blood pressure and stroke), there’s also a high degree of correlation between how prevalent different conditions are in each city.

For example, and unsurprisingly, cities where high blood pressure is more frequent tend to see more strokes:

But we can do more than plot specific pairs of conditions; we can use this data to build a “family tree” of diseases, with closely related diseases tending to show up together in the same city:

Just as high blood pressure and strokes are closely related, so are diabetes and chronic kidney disease, or coronary heart disease and chronic obstructive pulmonary disease. Interestingly, pretty much all conditions are correlated with each other, except cancer. Leaving the prevalence of cancer aside for a minute, the other thirteen conditions in the CDC data set are correlated closely enough that a direct PCA process gives us a main component that explains almost 80% of their variance.

In simpler terms: we can build a single “general health index” number that predicts relatively well the prevalence in a given city of different kinds of health outcomes, from strokes to asthma.

Calculating this number using PCA and rescaling it to have zero mean and unit variance, we get a pleasantly well-distributed index:

This allows us to sort cities in order of “general healthiness”:

1. San Ramon (CA)
2. Sunnyvale (CA)
3. Mountain View (CA)

498. Gary (IN)
499. Flint (IN)
500. Detroit (MI)
No big surprises there. Twelve of the top fifteen healthiest cities in the US, by this definition, are in California — habits, demography, income, everything helps, and the “California health nut” stereotype has, unlike most, data to back it up. The mirror image of this situation are cities like Detroit and Flint; one corollary of the fact that we do have a large body of public health knowledge is that, unlike in other historical periods, differences in population-level health outcomes are a function of economics and politics (in their broadest senses) rather than, or more than, biology.

However, recall that to build our elegantly simple “general health index” we had to put aside the not-so-small matter of cancer. It turns out that cancer plays by very different rules, as it becomes obvious when we plot its (similarly normalized) prevalence against our generic health index:

Cities like San Ramon (CA) and Mountain View (CA) have an average prevalence of cancer with respect to the rest of the country, but higher than much less healthy (in the everything *but* cancer sense) cities like Laredo (TX) and Brownsville (TX). Here youth seems to beat experience, income, and technology: Laredo and Brownsville have median ages of 28.2 and 27.7 years respectively, while San Ramon has a median age of 37.6 years, almost matching the US’s overall metric of 37.8 years.

The lack of correlation between the prevalence of cancer and that of most everything else shows not only the profound differences in their physiological mechanisms, but also in the state of our medical technology. We know quite a bit about how to prevent things like diabetes, strokes, high blood pressure, etc. — by and large, they are different expressions of the same set of underlying physiological issues, which is part of the explanation of why they are so closely correlated across cities. Cancer is a different matter. It’s less a single disease than a bewildering array of cellular insurrections, one on which we’ve made astounding strides during the last years, but still comparatively poorly understood.

This is an stark example of the difference between technological possibility and political outcomes: given the state of our technology, Flint and Mountain View should be equally healthy, as they differ in things we know how to improve, and are equal on the condition we are most powerless against.

The state of technology, of course, isn’t static: inexcusably late but at last, medical researchers are beginning to approach aging as a root disease, and having practical ways of reversing some forms of basic physiological damage — the common mechanisms behind the “everything-but-cancer syndrome” — will lead to improvements in how we treat and prevent most conditions. And if cancer is the one we know the least about, it’s also the one where our improving computational capabilities might help the most. But there’s little difference between not having a technology and choosing not to use it; we have reasons to hope for significant improvements in technological possibilities during the next few decades, but a hazier plan for their public health impact.

Original Source

## Marcelo Rinesi

Applied researcher focused on data analysis and inference, emerging technologies and their applications. Experience in the software industry, finance, online games and e-commerce, and the non-profit sector. Specialties: data analysis and modeling, writing, programming. He occasionally writes and gives talks about the ethical and social aspects of AI. Check him out here: https://rinesi.com/

1