Researchers across NVIDIA, MGH and BWH Center for Clinical Data Science, and the Mayo Clinic devised a method for generating synthetic...

Researchers across NVIDIA, MGH and BWH Center for Clinical Data Science, and the Mayo Clinic devised a method for generating synthetic abnormal MRI images to combat a lack of sufficient training data.

A machine learning system can only be as good as the data that it is trained on. Yet this truth presents an unfortunate reality for the medical community when normal data is much more readily available than pathologic data. Improving data diversity increases the likelihood that a deep learning method will be capable of classifying a wider range of diagnostic possibilities. Generative adversarial networks (GANs) have recently grown in popularity for medical imaging, but this particular research endeavor represents the first known attempt at “generation of synthetic medical images as form of anonymization and data augmentation for tumor segmentation tasks.”

For the synthesis of the MRI images, the researchers drew upon two publicly available data sets of brain MRI: the Alzheimer’s Neuroimaging Initiative (ADNI) data set and the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) data set. Using an image-to-image translation GAN, the synthetic abnormal brain tumor multi-parametric images were then produced from their corresponding segmentation masks. Adjusting data labels also allowed the researchers to incorporate greater variability in the images produced, such as altering tumor size or location within an image. As a result of the generated images, segmentation performance was enhanced in a variety of algorithms. In fact, the difference between training models on real subject data versus synthetic data appeared to be negligible. In addition to this, the generative models allow for anonymization of training data, which enables the data to be subsequently shared.

With the ability to create synthetic images, even smaller organizations with less access to massive troves of data should have the power to make deep learning models founded on a diverse selection of training material. Looking ahead, synthetic data generation may play a vital role in democratizing the big data landscape.

Find out more here.

Kaylen Sanders, ODSC

I currently study Computational Linguistics as an M.S. candidate at Brandeis University. I received my Bachelor's degree from the University of Pittsburgh where I explored linguistics, computer science, and nonfiction writing. I'm interested in the crossroads where language and technology meet.