fbpx
Harnessing the Power of GIS and Python for Property Value Analysis at Scale Harnessing the Power of GIS and Python for Property Value Analysis at Scale
Editor’s note: Mike Dezube is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Unlocking... Harnessing the Power of GIS and Python for Property Value Analysis at Scale

Editor’s note: Mike Dezube is a speaker for ODSC East this April 23-25. Be sure to check out his talk, “Unlocking Insights in Home Values: A Multimillion-Row Journey with Polars,” there!

Tremendous amounts (Petabytes) of public information are available via the Census Bureau and local municipalities, and we at Charles River Data love exploring this data and the stories they tell for our clients, and for the public good!  In this tutorial we’re going to cover how to acquire this data, process it, and understand it at scale – we’ll start with all (2.5M) property records in Massachusetts.

We’ll be doing a deep dive on Massachusetts home property value analysis at the ODSC conference in April, exploring how they have changed over time due to factors such as the advent of covid, proximity to Boston and population density. In the final section we’ll show how to extend this analysis to any state.

In-Person and Virtual Conference

April 23rd to 25th, 2024

Join us for a deep dive into the latest data science and AI trends, tools, and techniques, from LLMs to data analytics and from machine learning to responsible AI.

 

But as a preview for this talk, let’s get our feet wet with some explorations to understand the data at hand.

Average home value by town

Average home property value analysis by town in Massachusetts. A visual we’ll build together at ODSC this year.

Loading all property values in Massachusetts

Per mass.gov, there are about 2.5M properties in Massachusetts.  In the code below we’ll load an export of mass.gov from 2024 and explore the 180,312 properties in Boston.  This data is available for every state and as a full nation set as well, contact us for details.

Let’s get started.

# See https://github.com/mdezube/property-assessments/blob/main/README.md for
# conda install directions
import geopandas as gpd

# Load in the geometric boundaries of all properties in eastern MA. We could readily load
# western too and merge but for this quick tutorial we'll start with Eastern.
# Download from https://www.mass.gov/forms/massgis-request-statewide-parcel-data
EASTERN_MA_SHP_FILE = "<location on your machine>/L3_TAXPAR_POLY_ASSESS_EAST.shp"

eastern_ma_parcels = gpd.read_file(
    EASTERN_MA_SHP_FILE,
    engine="pyogrio",
    use_arrow=True,
)

eastern_ma_parcels["OWNER1"] = eastern_ma_parcels["OWNER1"].str.replace(
    r"COMMONWLTH\b|COMMWLTH\b", "COMMONWEALTH", regex=True
)
eastern_ma_parcels["OWNER1"] = eastern_ma_parcels["OWNER1"].str.replace(
    r"MASS\b", "MASSACHUSETTS", regex=True
)

boston_parcels = eastern_ma_parcels[eastern_ma_parcels["CITY"].str.upper() == "BOSTON"]
print(
    f"{eastern_ma_parcels.shape[0]:,} properties – {boston_parcels.shape[0]:,} in Boston."
)

1,879,297 properties – 180,312 in Boston.

Let’s view a sample of 10 at random to get a feel for the data.  There are >40 columns describing each property, but we’ll focus on just a handful in this tutorial.  We notice the data tell us quite a bit: the exact address, value of the building, the land, the style, the use type (see full details here) and even who owns it (a column we’ll explore later).

boston_parcels[[
    "CITY", "ZIP", "SITE_ADDR", "TOTAL_VAL", "BLDG_VAL", "LAND_VAL", "RES_AREA", "STYLE",
    "USE_CODE"
]].head(10)


10 Boston properties at random

What are the most expensive properties in Boston?

Now that we have the properties, we can start interrogating them a bit. Let’s start with the simplest of questions, what are the top 10 most expensive?

 

boston_parcels[[
    "SITE_ADDR", "TOTAL_VAL", "BLDG_VAL", "LAND_VAL", "LOT_SIZE", "OWNER1", "STYLE"
]].set_index("SITE_ADDR").sort_values(by="TOTAL_VAL", ascending=False).head(10)

Top 10 most expensive properties in Boston

Not terribly surprising, we see the most expensive properties are massive entities well known in Boston such as the The Hancock Building (#1) and Brigham & Women’s Hospital (#2).  There are however two interesting examples here that buck the trend being massive in size or lacking size information.  These are #4, UMass Boston, which stands out owning a stunning 170 acres in Boston, and part of Harvard University (#6), the largest building ever built by Harvard, which per their tax records hasn’t had the land value assessed yet.  This may seem odd, but as a nonprofit taxes aren’t always due and hence the tax assessed land value isn’t as critical.  Or, perhaps the land underneath the building is owned by another entity (a potential fun exploration for the reader).

Perhaps more interesting, we can take the same data but filter to a residential USE_CODE, so then we can see the most expensive homes in Boston and who owns them, or as you’ll see for #3 and #4 below, the ones that have yet to sell.  Note we omitted LOT_SIZE which is less interesting for apartments (it’s 0) but added RES_AREA which is the square footage of the unit.

boston_parcels[boston_parcels.USE_CODE.str.contains("10.*")]


Top 10 most expensive residential properties in Boston

Although quite large, these values are accurate (as a potential follow-up for the user, you can explore these properties  in Zillow to get a sense for what a $34 million apartment looks like).

Who owns most of Boston?

Circling back to non-residential, how much of Boston do these large entities own?

Below we see that the city of Boston itself owns most of the property which isn’t terribly surprising, but what comes next is a bit more of a shock, Boston University owns 1.5% of the total property in Boston, almost as much as the state of Mass (at 2.0%) and more than the federal government at 0.6%. Harvard is at 1.2%, but this jumps dramatically if we start thinking about the properties they own in Cambridge as well.

import seaborn as sns

df = boston_parcels.groupby("OWNER1")["TOTAL_VAL"].sum().to_frame()
df["PERCENT_OF_TOTAL"] = df["TOTAL_VAL"] / df["TOTAL_VAL"].sum() * 100
df = df.sort_values(by="PERCENT_OF_TOTAL", ascending=False)

ax = sns.barplot(
  data=_df[:10], x="PERCENT_OF_TOTAL", y="OWNER1", palette="flare", hue="PERCENT_OF_TOTAL"
)
ax.set(xlabel="Percent of Boston Owned", ylabel="OWNER1")
for i in range(10):
    ax.bar_label(ax.containers[i], fontsize=11, fmt=" %0.1f%%")

Boston owernship by property value

If we ask the question another way, who owns the most land in Boston, then we can see the numbers change drastically with the city owning 13.0% given their large ownership of parks.

Boston ownership by land map \ Property Value Analysis

 

Wrap up

This exploration scratches the surface of what’s possible with GIS, and can be accelerated quite a bit by using https://pola.rs/, an important step to drop the analysis calls from 10’s of seconds to milliseconds.  Attend the talk on Unlocking Insights in Home Values: A Multimillion Row Journey with Polars to dive deeper across the full state, look at home parcel definitions, town trends, the impacts of COVID, and learn how to continue to explore and expand on your own using code we make readily available.  We include a visual at the start that we’ll create together, and a few more questions to think about, answerable from this dataset:

  • Most expensive homes on each street / in each zip
  • Most expensive styles of homes, and which held their value the best
  • Homes that changed owners the most / least recently
  • Areas where commercial is high %, vs. mixed residential, vs. only residential
  • How to extend to other states and other nations

Questions?  Want to apply these techniques to your own data?  Reach out at https://www.charlesriverdata.com/get-started

In-Person Data Engineering Conference

April 23rd to 24th, 2024 – Boston, MA

At our second annual Data Engineering Summit, Ai+ and ODSC are partnering to bring together the leading experts in data engineering and thousands of practitioners to explore different strategies for making data actionable.

 

About the Author:

Mike Dezube is the Founder & CEO of Charles River Data, a Boston-based data science consulting firm. Charles River Data helps its clients solve complex problems through the use of advanced data science and machine learning. More than just engineering talent, Charles River has recruited from Google, Amazon, Meta, BCG, Jefferies, and JP Morgan, bringing experience across retail, operations, GIS, defense, healthcare tech, hospitals, insurance (health and other perils), digital marketing, finance, banking, insurance and private equity.

Dezube has leveraged his 7+ years of experience at Google to both form this all-star team, seed it with a wealth of experience in the industry, and to attract seed funding to position Charles River Data for growth.

Mike also performs academic research at Mass General Brigham in cancer, opioid reduction, and improving post-surgical recovery, along with acting as a GIS consultant for Blue Hills, the largest conservation land area in the Greater Boston Area.

Michael Dezube

Michael Dezube

Mike Dezube is the Founder & CEO of Charles River Data, a Boston based data science consulting firm https://www.charlesriverdata.com/

1