The year since ChatGPT’s debut has seen marked increases in news coverage and adoption of generative artificial intelligence methods across industries. Most coverage focuses on large language models (LLMs), which are deep learning frameworks trained on massive text samples to generate believable text output. However, generative artificial intelligence (AI) can generate a wide variety of data types: images, voices, videos, and even protein structures.
Generative AI involves training large neural networks on input data such that new data can be generated upon request. Typically, input data related to a user’s request is encoded, matched to training data examples, and then run through the generative portion of the algorithm to output new data related to the input request. With large enough models and training samples, many generative AI technologies are now very good at generating believable outputs.
One of the less-discussed applications of generative AI includes the acceleration of drug development. The last decade’s outbreaks of Ebola and COVID-19 highlighted the need for quick turnarounds on effective treatments and vaccines to save lives. Generative chemistry algorithms typically process database molecules or proteins with known properties such that they are represented as training data in graph form. The algorithm then learns molecular property patterns common in the training dataset. Once trained, the generative algorithm suggests new molecules or proteins based on the learned patterns, which can be screened by other algorithms before the synthesis of the most promising compounds in the lab.
Another field that has benefited from generative AI is public health. When I worked on an HIV education campaign in 2007, our process involved brainstorming ideas, developing ideas into images and messaging, beta testing these samples in communities, hiring a translator to translate messages into local languages, and then disseminating the information through fliers or billboards. With generative image and text algorithms, dozens of images and campaign slogans can be generated within minutes, disseminated to beta groups through crowd-sourced platforms, and put into use hours or days later rather than weeks or months later. For instance, here is a generated image encouraging people to wear a mask during the COVID-19 outbreak that could be beta-tested with relevant demographic groups:
Many generative AI algorithms produce high-quality data in very short periods of time. While training still involves large datasets and substantial computing resources, once base models are trained, fine-tuning with relevant datasets allows a general base model to be repurposed for specific use cases—such as public health campaign image generation or messaging. The flexibility of the models, the ability to fine-tune them, and their quick generation of data make generative AI more and more accessible to data scientists and even lay audiences.
Like most technologies, generative AI can be used ethically or unethically. Many ethical concerns exist within generative AI, including the potential for deep fakes—images or videos of fake actions by real people. Deep fakes can be used to propagate fake news about politicians to influence an election or to intimidate an intimate partner trying to leave a relationship.
However, intentional misuse is only one potential issue with using generative AI. Hallucinations, where generative models perceive false patterns when integrating input requests and training data, allow generative AI algorithms to go rogue, potentially creating false sources of information or wildly inaccurate outputs given the input request. There is also a potential for these algorithms to learn human biases included within training datasets; some of the larger scraped datasets do not involve quality assurance steps to remove instances of bias. A combination of hallucinations and unvetted input data have led to chatbots insulting users or suggesting dangerous behaviors. Vetting data sources and testing applications for biases can reduce risk in generative AI applications.
With proper training practices and ethical awareness of potential misuse, generative AI offers many options for low-cost, high-efficacy social good initiatives. Open-source algorithms and training on their usage provide an avenue to AI equity in the developing world and in communities underrepresented in the technology sector; they also allow researchers, scientists, and technology professionals to develop software that solves immediate community needs. In my ODSC East 2024 talk on generative AI for social good, I’ll be highlighting specific tools and my experiences with generative AI social good applications.
Open-Source Generative AI Resource Links:
https://huggingface.co/models (LLMs and other algorithms)
https://www.craiyon.com/ (open-source image generator)
https://pubs.acs.org/doi/10.1021/acs.jcim.3c00562 (generative chemistry article)
https://www.sciencedirect.com/science/article/pii/S2590098623000258 (generative chemistry article)
https://dl.acm.org/doi/pdf/10.1145/3579592 (generative public health article)
Colleen M. Farrelly is a mathematician at Post Urban Ventures, a venture firm which focuses on deep technology for social good. Her research interests include generative AI, network science, and the application of geometry and topology to machine learning. She is the author of The Shape of Data and a forthcoming network science book.