

Creating A Data-Driven Retail Expansion Framework
Data VisualizationModelingposted by Jordan Bean January 21, 2021 Jordan Bean

You’ve opened a business and it’s grown. You opened one or two more locations in places that you thought would be a good fit; maybe you’re Starbucks and have opened thousands more. One of the most important questions a retail entrepreneur or business faces is where to open the next location. Some larger businesses have location scouts whose sole responsibility is to grow the business; many don’t.
A previous article I wrote on predicting Starbucks’ future locations prompted a reach out from an entrepreneur that was going through just this challenge — Where should they open the next location for their premium ice cream products within Greater Pittsburgh? How could they use data to improve their odds of picking a successful location?
Click here to view the dashboard that will be presented in the rest of the article.
Imagine you’re the entrepreneur that reached out and had opened a couple successful locations for your premium ice cream brand. Out of the world of options, which variables are important to you? A few things that came to mind for me after talking to them about this problem were:
– Population: Foot traffic from both local residents and tourists is critical. The more people that are in the area to serve, the more potential customers you have. Further, a worthwhile distinction is a population comprised of families as collectively their purchase is more valuable than an individual.
– Restaurant Density: One important time that triggers a trip to an ice cream shop is after dinner. People have already committed to being out, they’ve already paid for a meal, and ice cream is a comforting way to finish off a night out.
– Location, location, location: If you’re one block away from the main block, foot traffic decreases. The store needs to be high visibility, convenient, and in a walkable area.
– Competitors: Is this area already crowded with competitors? If someone beats you to the location, and it can’t support multiple (or more) business, then it doesn’t matter how strong the other fundamentals look.
How can we use data to help solve this problem using these variables and more?
There were three key steps to the project: Gather, Analyze, Communicate:
– Gather: Source relevant data on population, income, restaurant density, competitors, and location. Manipulate the data to match the desired format.
– Analyze: Use the aggregated data to objectively score and prioritize locations. Balance the data with manual scouting to reduce overreliance on the scoring algorithm.
– Communicate: Build both a self-serve Tableau dashboard and a presentation that effectively communicates the results.
Gather
We won’t spend too much time talking about the actual data gather process — feel free to reach out if you’d like to chat about it more — but the gist of it is US government, local government, and commercial data were acquired and aggregated at the zip-code level to produce a master file with each local zip code and its associated relevant metrics.
Analyze
We started by looking at how some of the data distributed by zip code throughout the region. Where does the population concentrate? Which zip codes have the highest restaurant density? Which zip codes tend to spend more on ice cream?
Map of Population, Restaurant Density, and Avg. Ice Cream Expenditures by Zip code. Darker green indicates higher values.
The above maps start to give us an understanding of the trade off we might face. The population by total volume (left-most map) concentrates most in about 5–6 zip codes outside the city center. However, there’s the largest cluster of restaurants and restaurant density downtown (middle map) while average ice cream spend tends to be fairly uniform — but higher — outside of the downtown. Which do we prioritize? How should we weight the importance of each variable?
This is where the experience of a business owner comes into play. It’s important to be able to be able to recognize your own limitations. One thing I do well is collect, analyze, and present data. One thing I don’t do well is have the domain expertise for each subject.
To compensate for this, I built a scoring algorithm with self-serve inputs. Do you think that Restaurant Density and Total Population are most important? Assign them the most weight by inputting a higher score in the boxes. Do you want to weight higher the walk score or presence of families (K-12 Enrollment)? Increase those scores and decrease the others. The variables available to toggle are:
– Walk Score: Is the neighborhood accessible by foot? Is it an area that’s likely to have high foot traffic as a result?
– Restaurant Density: How many restaurants are nearby? Is it a hub for activity that will draw potential customers?
– Competitive Presence: Has someone else already “cornered” this location? Is there room for more shops?
– Total Population: Are there enough people to give a large potential customer base?
– K-12 Enrollment: A proxy for families vs. individuals — Is there a large family population?
After inputting your scores on the left, the map and table update. Each zip code is scored on a 0–1 scale (0 is worst; 1 is best).
A hypothetical 100 points allocated to our most important variables. Darker green indicates a higher zip code score.
We’ve now defined our important variables and have a way to prioritize which zip codes have the most attractive demographics given that information. Despite the business having most of their locations today in the core downtown area, we can see that there are several highly ranked zip codes that extend into the suburbs that we could further consider based on this initial prioritization.

Now, let’s say that you’ve decided you want to take the zip code that has the highest score — above, 15212. What do you do with this information? A zip code can be large and/or diverse, so there needs to be a way to get more specific than just a zip code.
For me, something I’d be interested in knowing is where the majority of restaurants are located in the zip code. We would have the highest likelihood of success locating ourselves in or near the areas that already drive the most foot traffic.
In the dashboard, when you click on a zip code a map below it auto-updates with all of the restaurant locations in that zip code. For zip code 15212, the results are shown below. The first image represents all restaurants in the zip code that was clicked and the second image has additional detail in the table. The table identifies which streets in that zip code have the most restaurant locations and upon hovering on that street, the restaurant locations are highlighted on the map.
All restaurant locations in zip code 15212

– Identify where our important variables distribute in the data
– Rank and prioritize zip codes using our subject matter expertise while maintaining flexibility to change variable importance weights
– Select the zip code that we want to view and see where the “activity” in that zip code is
– Find the specific street(s) that merit on-the-ground research
Recommendations
Ultimately, I chose to create a three-pronged framework for expansion options:
– Status Quo: Find a location that has similar characteristics to current locations and stays close to the downtown area
– Geographic Expansion: Find a location that resembles current demographics but outside of the core downtown area
– Opportunistic: Balance data with intuition to find a location that might not be ranked highest but represents a promising opportunity to consider
The merits and considerations for each can be seen below.
For each of these expansion options, the Tableau tool — paired with “on-the-ground” research in Google Maps — led to the following three recommended locations.
My recommendation was that the short-term approach should be the Status Quo while in the medium-longer term the Geographic Expansion option should be further explored.
I felt that the Status Quo option risked little cannibalization (all current stores are south of the river in the view), had the highest natural foot traffic potential (proximity to the football & baseball stadiums plus a Children’s Museum), and had the least uncertainty given the current state of retail.
In the medium-longer term, I felt the Geographic Expansion option was best as it reached a wholly new population in high-income zip codes. I also felt that it was prudent to wait on the Geographic Expansion to see whether the “urban exodus” is a permanent or temporary movement of the population.
Alternative Considerations: Competitive Analysis & COVID Impact
Another approach I’ll touch on briefly is that we can use the tool for a competitive analysis. If we’re interested in the downtown area, we can highlight those zip codes and filter the bottom map for only Competitors. We can see that North of the River there’s very few ice cream shops. From here, we can decide if that represents opportunity (attractive option that no competitors has moved on yet) or if there’s a justified reason for it (zoning, pricing of commercial real estate, or another reason outside of the available data makes it an unattractive option).
An example of using the dashboard for competitive analysis
From a COVID perspective, it’s clear that any new location represents uncertainty and may not be viable in the short-term. A low capital expenditure alternative could be distribution through retail partners — for example, convenience and grocery stores.
The company has current retail partnerships and the logical extension of that is to try and grow with their current partners as it’s an easier process than finding and setting up new partners. From their website, we can aggregate all potential locations of current partners and plot it on a map.
Yellow dots represent all potential partners. Stars are current locations. Map shading is darker green for a higher number of households (higher population) and darker red for a lower number of households.
It also makes sense that any expansion of distribution should occur away from current retail locations (where someone already has the product readily accessible to them) in order to maximize customer reach. The map on the left highlights all locations of their current partners (regardless of whether the product is currently there; that information isn’t currently accessible) along with their current storefront locations. The red boxes represent the “clusters” of stores that would be best to approach for distribution expansion.
The other benefit to this approach is that it’s a testing ground for suburban expansion. The box south of the downtown location represents a convenient midpoint between current storefronts and the Geographic Expansion option that was identified earlier. If the product sells well in these stores, we can have more confidence that the location might have long-term success. If sales are underwhelming, it’s a low-cost test where we can then reconsider if the market can sustain our product with a retail footprint.
Interested in talking location strategy and analytics? Feel free to connect with me on LinkedIn or reach out to me at jordan@jordanbean.com.
Original post here. Reposted with permission.