BlogData VisualizationData Visualizationposted by Susie Lu July 27, 2017 Susie Lu
A series of data visualizations on Farmers’ Market data from data.gov.
- Location (lat, long, city, state, address)
- Hours of operation
- Different goods categories and a yes/no if they’re available at that market
UNDERSTANDING THE DATASET
I made the following graphic to understand the distribution of the 29 goods for each farmers market such as availability of honey, flowers, fruits, wine, etc. After filtering out some results that didn’t have attributes I was interested in, there were roughly 5.5k farmers markets left.
I was surprised to see how correlated location was with maple availability and that there are more farmers markets with baked goods, honey, and jam, than there are with fruits. Also fascinating to see that 18% of markets sell pet food.
A fun abstract example where each farmers market is represented by a series of circles depending on if it had juice (cyan), cheese (yellow), baked goods (magenta), or an empty circle if it didn’t have any of those goods.
Animating jus cuz.
The next version was to look at two different dimensions for the farmers markets and understand how the dimensions overlapped. I explored this with a bump chart that encoded the overlap as well as changed rank.
In this case you can see how seafood dominates the farmers markets on the west coast but then as you get into the midwest maple starts to take over and then dominates for most of the eastern side of the US. The interactive version allows you to pick which goods you’re comparing.
See the interactive version on bl.ocks.
This iteration was spurred by Elijah suggesting k-means clustering on the dataset. Thanks @emilbays for the kMeans Library.
The first pass was to understand when clustered, how did each cluster’s centroid fall from 0 (No) to 1 (Yes) on the scale for each dimension.
I decided it would be more interesting to show those distributions connected in a series. The x-axis is organized from most common to least common goods.
The biggest change in the final version was the addition of areas around the lines representing each cluster’s centroid values. The areas exaggerate that cluster when it deviates from the rest of the clusters. This attempts to visually answer the question “Which features in each cluster differentiate it from the rest?”
Creating the legend for this made me realize how much even a simple annotations framework would make it so much easier to create callouts and annotations for your visualizations. Sounds like my next project 🙂
See the interactive version on bl.locks and the original post location here.