The Data of Disease: Using Pharmaceutical Data Science to Improve Vaccine R&D The Data of Disease: Using Pharmaceutical Data Science to Improve Vaccine R&D
Vaccines traditionally take decades to develop. Duccio Medini and his colleagues at GlaxoSmithKline are looking for ways to change that. Duccio... The Data of Disease: Using Pharmaceutical Data Science to Improve Vaccine R&D

Vaccines traditionally take decades to develop. Duccio Medini and his colleagues at GlaxoSmithKline are looking for ways to change that. Duccio Medini addressed attendees at ODSC London 2018 in a half-hour lecture on how data science is integrating with vaccination research, speeding up the development process and enabling researchers to handle larger, more complex projects in pharmaceutical data science.

[Related Article: ODSC East 2019: Major Applications of AI in Healthcare]

Medini is head of data science and clinical systems at GlaxoSmithKline, one of the largest pharmaceutical companies in the world. He discussed his company’s approach to modernizing vaccine research and development, which has historically lagged behind other industries in adopting data-driven approaches to their work.

“Pharma is known to be one of the verticals that is less advanced… in the adoption of modern technologies, and that is due first and foremost [to] the highly-regulated environment,” he said.

Although regulation can place checks on research that many in the industry consider onerous, he added, “we’re dealing with products that get into human bodies that interact with our health, so regulation is slowing down innovation for a good reason, I believe.”

Regulations notwithstanding, vaccination research is an arduous process. It typically requires decades of effort, starting with discovery, to technical R&D, to clinical R&D, and finally to delivery. “This takes approximately 10 to 15 years when things go well — and things don’t go well more than 10 percent of the time,” Medini said.

Vaccine development is an extraordinarily complex task that can become mired in a wide range of domain-specific problems throughout the research process. To begin with, the goal of a vaccine is to create an agent that will modulate the interaction between the immune system and infectious agents, each of which is a self-contained complex system. Additionally, vaccine development requires cooperation between experts across a wide range of different specialties, including chemistry, epidemiology, statistics, and engineering.

Finding a common parlance between these disparate working groups can be difficult. “The only common language among all these verticals is data,” Medini said. Competition and an updated data strategy by the National Institutes of Health also create pressure for pharmaceutical companies to update their business models to be data-driven.

“The secret to achieving the end goal is a proper integration of the data, but this is much more easily said than done,” Medini adds.

Medini identified the four ‘pillars’ that GlaxoSmithKline relies on for its data strategy. It begins with data management and governance, which are the processes a company uses to label and maintain its data to enhance usability. Secondly, GlaxoSmithKline embraces a data ecosystem-based off of semantic data analysis and pluggable APIs, allowing datasets to be recombined in a meaningful and consistent way. Third, the company continues to expand its data analytics capabilities, which are critical to evaluating the performance of its products in silico. Lastly, GlaxoSmithKline embraces a broad-based approach to data literacy, working to advance its practitioners toward semantic data properties and, more generally, to train its workforce to at least understand how its datasets work.

Medini offered examples of problems this data framework allowed the company to tackle, such as its historically fractured information management systems. Because of the massive complexity associated with the research and the disparate data required, experimental frameworks are typically spread across a large range of processes that can require workflows specially designed by an industrial engineer.

Most of the data systems “are not built on open APIs, so you can’t even build a decent data interface, or at least not a full one,” typically due to regulation, Medini said. A unified data management system is an antidote to these kinds of problems.

[Related Article: Machine Learning and Compression Systems in Communications and Healthcare]

This drive toward data management systems has highlighted other issues inherent to crossing domain science with data science. For a long time, it was difficult to find data scientists that were also proficient in hard sciences, but that is beginning to change.

“The duel between the domain field expert and the data scientist is coming to a point where we are seeing a new generation of data scientists developing on the job or getting out of academia, which are already specialized in the domain field,” says Medini. “It’s a very special profile that we need if we want the right talent.”

Corporate vaccination research and development is accelerating as experts slowly introduce data science, with scientists like Medini at the helm. To hear Medini in his own words, listen to his full talk at ODSC London 2018 on YouTube.

Spencer Norris, ODSC

Spencer Norris is a data scientist and freelance journalist. He currently works as a contractor and publishes on his blog on Medium: https://medium.com/@spencernorris