fbpx
Visualization Throughout the Data Science Workflow with Lindsay Brin, PhD, Data Scientist at T4G Limited at ODSC East 2018 Visualization Throughout the Data Science Workflow with Lindsay Brin, PhD, Data Scientist at T4G Limited at ODSC East 2018
Summary: In the eyes of Dr. Lindsay Brin, data scientist at T4G Limited, there are three steps in the data science workflow: process,... Visualization Throughout the Data Science Workflow with Lindsay Brin, PhD, Data Scientist at T4G Limited at ODSC East 2018

Summary: In the eyes of Dr. Lindsay Brin, data scientist at T4G Limited, there are three steps in the data science workflow: process, insight, and communication. During her talk “Visualization throughout the data science workflow: Why It’s Useful and How Not to Lie” at ODSC East 2018, Brin expanded upon the key essentials of the data science process. She offered examples of how to approach data from a project’s outset, investigating patterns rather than just looking at numbers. Brin then proceeded to dig into the value of data integrity and spoke about how we, as organizations and individuals alike, can produce visualizations that are accurate representations of the data at hand.

“There’s no point in doing good science unless you’re able to communicate your data and your findings and your story to other people,” states Brin. Similarly, there’s no point in communicating your data if the story it tells is rooted in misrepresentation.

Slide Copyright Lindsay Brin, ODSC East 2018

Embrace the process

While it may be tempting to immediately launch into interpretation of the data, Brin highlighted the fact that figuring out how to employ statistical tools is vital to the data analysis pipeline. Rather than blindly applying analyses, we should first take a look at patterns in the data by generating visualizations. Whether we notice evidence of linear regression versus polynomial regression could make all the difference when it comes to statistical and business application. Even once a visualization has been studied, the process doesn’t stop there.

Should I expand the range of values for my grid search? Should I look at changes in variance? These are the kinds of follow-up questions that we should be asking along the way, as different seeds lead to different optimal parameters. Visualization doesn’t have to be complicated, but it should lead you to take note of patterns and ask insightful questions.

Slide Copyright Lindsay Brin, ODSC East 2018

Strive for integrity

Numbers can be correct, but the choices behind their visualization can lead to the wrong story being expressed. Brin encouraged us to think about how the parameters of a visualization drive its interpretation. Something as simple as plot scale will necessarily affect how we see patterns or trends in data. In another case, original data values may appear random, yet the log-transformed values may reveal a striking correlation. During the exploratory data analysis, we should examine a range of possibilities, bearing in mind that plot parameters hold powerful transformative abilities when it comes to visuals.

Brin additionally mentioned the power of the hue, value, and intensity of color in data visualization. Color scheme can determine the way that a person’s eyes move through a composition. Moreover, color has the ability to indicate categories, show relationships, and represent numerical value. Exploiting color to move a viewer’s eyes through your data is smart and effective, but again, it remains important to use color in a strategic effort to convey truths about your data to the viewer. As data scientists, we must be vigilant about not implying relationships in data where they do not actually exist.

All in all, Brin urged the audience to consider their data with a critical eye throughout the entire data science workflow. From contemplating the appropriate statistical methods to thinking about the data’s impact on an audience, Brin showed that the job of a data scientist is one of constant interrogation. Painstaking as it may be, this step-by-step process is what yields the kind of data that is both honest and compelling.


ODSC East 2018 is not the last conference of the year to learn about the future of AI and network with some of the brightest minds in the field. Check out details to attend ODSC Europe 2018 from September 19th-22nd in London or ODSC West 2018 from October 31st – November 3rd in San Francisco to expand your knowledge and network in data science.

Kaylen Sanders, ODSC

I currently study Computational Linguistics as an M.S. candidate at Brandeis University. I received my Bachelor's degree from the University of Pittsburgh where I explored linguistics, computer science, and nonfiction writing. I'm interested in the crossroads where language and technology meet.

1