While verbal communication is a necessary skill for any data scientist, visual communication is becoming more important for maximizing impact. Viz is hard, it requires a different skillset and recognition that your design choices can manipulate the story. As Data Scientists drift into the forbidden realms of artistic expression, I’m engrossed by the question “How can we represent data objectively?”
To demonstrate the phenomenon of visual manipulation, I’ve chosen to create five time series plots representing polling data. Time series plots commonly accompany news stories, putting them in a unique position to influence public perception. We’ve been seeing a lot of time series plots associated with the presidential election recently, and each viewer can reach a different conclusion of the polls based on the data’s presentation.
A note: my goal in utilizing polling data to tell this story is not to convince you one way or another about the presidential election, but rather to demonstrate the different stories we can extract from a popular data set.
[graph1] This is the most classic representation of polling data. This tells a pretty clear story of the dance between our two major candidates, which is likely why it’s used most frequently.
The difference here is a line for “other,” which isn’t always shown on these graphs. The addition of this line brings in a very interesting number of substitutions between a major candidate and support of anyone but that candidate. Since news stories often focus on who’s supporting which of the two major candidates, we miss this sizable story.
[graph2] Here is another common way of looking at share over time. The disparate bars make it difficult to read, but being able to see each share as a volume over time adds a level of magnitude that feels missing from graph1.
[graph3] Interestingly, when the data is reordered, a story starts to shake out. With “other” in the center, Clinton and Trump’s shares in relationship to “other” become much more apparent. This is worth noting because not only is this the same data as before, it’s the same style of graph as before. We simply changed the order of our data and we see something else entirely.
[graph4] This graph is a marriage of the two styles above. The weakness of graph1 is that it’s difficult to connect the volume of support with the singular line, while the weakness of graph3 is that the disparate bars lead to a shaky view of election share over time. Graph4 makes the severe dip of “other” on 7/22 a story in itself. Don’t you want to know what happened on 7/22 now? It was the day after Trump’s Republican National Convention speech and would have been the first time Americans were polled after seeing the speech.
Graph4 does a good job of overcoming both weaknesses of graph1 and graph3, but introduces a new weakness. It’s easy to mislead the viewer visually through whichever candidate appears on top. Trump’s share is actually decreasing, but decrease looks like an upward trend on 8/8. It’s possible that this misnomer would decrease with more data, though.
[graph5] The final graph worth seeing is the data as a percent change from the previous poll date to the next. This is somewhat of a hilarious mess, but is much more effective at demonstrating the chaos at the beginning of the series. While we see the lines jump around in the first graph, there’s something about the fact that these lines are sitting on top of each other that makes the pattern of chaos to leveling off elicit a much stronger emotional response.
Responsible data communication is so important given how easy it is to manipulate a viewer’s perception of a topic through visualization. While each of these graphs tells a subtly different story of the election, it’s clear to see how the choice of visualization can have a strong impact on the story at hand.
All data was collected from Real Clear Politics presidential polls.
Feature image modified from original
- Top 10 Big Data Blunders Part 1 28 views | by Elizabeth Wallace, ODSC | under Accelerate AI, Featured Post
- 25 Excellent Machine Learning Open Datasets 26 views | by Elizabeth Wallace, ODSC | under Featured Post, Machine Learning, Modeling
- 10 Compelling Machine Learning Dissertations from Ph.D. Students 25 views | by Daniel Gutierrez, ODSC | under Academic Research, Featured Post