A new AI tool is predicting the functions of enzymes based on their amino acid sequences, faster than current leading tools. According to a report by Phys.org, the AI is even able to predict these functions when the enzymes are either unstudied or poorly understood. Called CLEAN, by beating out learning tools in accuracy, reliability, and sensitivity, the AI can help usher in a new wave of research in chemistry, industrial materials, medicine, and more.
The way other computational tools work is they attempt to predict enzyme functions by assigning a commission number, or ID code. This indicates what kind of reaction an enzyme catalyzes by comparing the query sequence with a pre-existing catalog of known enzymes and the finding of similar sequences. But the problem with this approach is that these tools lack the ability to do well against less-studied or uncharacterized enzymes or with enzymes that hold multiple roles. This is exactly where CLEAN excels.
Comparing the AI tool to ChatGPT, study leader Huimin Zhao, a University of Illinois Urbana-Champaign professor of chemical and biomolecular engineering said, “Just like ChatGPT uses data from written language to create predictive text, we are leveraging the language of proteins to predict their activity…Almost every researcher, when working with a new protein sequence, wants to know right away what the protein does. In addition, when making chemicals for any application—biology, medicine, industry—this tool will help researchers quickly identify the proper enzymes needed for the synthesis of chemicals and materials.”
Though not the first tool to utilize AI to predict enzyme commission numbers, CLEAN is the first to use a deep learning algorithm called contrastive learning to predict enzyme function. Saying of this new algorithm, Zhao said, “We cannot guarantee everyone’s product will be correctly predicted, but we can get higher accuracy than the other two or other three methods.” In their paper, the team did verify the claim and also found that the algorithm was able to correct enzyme mislabeling.
Currently, the group is making CLEAN accessible online for other researchers who are looking to do work characterizing an enzyme or determine whether an enzyme could catalyze a desired reaction. “We hope that this tool will be used widely by the broad research community,” Zhao said. “With the web interface, researchers can just enter the sequence in a search box, like a search engine, and see the results.”
Finally, in addressing the overall hopes for CLEAN, Zhao stated, “We want to predict the functions of all proteins so that we can know all the proteins a cell has and better study or engineer the whole cell for biotechnology or biomedical applications.”