Is Machine Learning Necessary to Solve Problems in Biology? Is Machine Learning Necessary to Solve Problems in Biology?
Editor’s note: Joshy George is a speaker for ODSC East this May 9th-11th. Be sure to check out his talk, “Is... Is Machine Learning Necessary to Solve Problems in Biology?

Editor’s note: Joshy George is a speaker for ODSC East this May 9th-11th. Be sure to check out his talk, “Is Machine Learning Necessary to Solve Problems in Biology,” there!

The French mathematician Pierre-Simon Laplace suggested that we can accurately predict the universe’s future if we know the precise position and velocity of every particle in the universe. This idea significantly influenced the development of classical mechanics and the scientific worldview that underpins much of modern science. When extended to biological systems, this idea suggests a world in which we should be able to understand the cause and course of a disease with high precision. In reality, this is not the case, as we are unable to predict the trajectory of even the simplest organism with very high accuracy. I will explore the reasons for this discrepancy between theory and practice and suggest that recent developments in machine learning will be necessary to improve the accuracy of predictions in biological systems.

An approach to predicting the behavior of a living organism involves studying the underlying physical and chemical processes that govern the conduct of individual cells, organs, and organ systems and integrating information from various levels of the organization. At the cellular level, researchers have identified the biochemical reactions within a cell and how they are regulated. At the organ level, we know the structure and function of specific organs such as the heart, lungs, or brain and how they work together to maintain homeostasis. By understanding how these different levels of the organization interact, we can develop models that predict the behavior of the entire organism. In short, if we can model the behavior of individual cells, then we can predict the whole organism’s behavior, as Laplace suggested.

However, the cell is a nonlinear system. A nonlinear system is one in which the output is not directly proportional to the input. In the case of the cell, the information would be the various signals and stimuli it receives from its environment. At the same time, the output would be the different cellular processes and behaviors that the cell exhibits in response to those inputs. Most gene regulatory relationships are nonlinear and, in addition, include feedback loops. These dynamics give rise to emergent behaviors and properties that are not easily predictable based on the behavior of individual components, making the eukaryotic cell a fascinating and complex system to study.

Overall, modeling and predicting nonlinear systems require sophisticated mathematical techniques, such as chaos theory, bifurcation theory, and nonlinear dynamics, which can help to identify patterns and predict the system’s behavior under certain conditions. However, even with these techniques, predicting the behavior of nonlinear systems can be challenging, and it is often subject to high uncertainty. In a nonlinear system, the output is not directly proportional to the input, which means that small changes in the input can cause significant changes in the output. In addition, nonlinear systems can have multiple equilibrium points or attractors, leading to unpredictable and chaotic behavior. Thus, even if we know the equations governing the dynamics and the initial conditions, we cannot predict the system’s future state.

Predicting the behavior of nonlinear systems using machine learning algorithms can be challenging, but it is possible using appropriate techniques. One common approach is to use time-series data from the system to train a machine-learning model, which can then be used to predict future behavior. However, it is essential to note that the accuracy of the predictions will depend on the quality of the data used to train the model and the complexity of the nonlinear system. Additionally, it is essential to validate the predictions to ensure they are accurate and reliable.

The combination of molecular cell biology, nonlinear dynamics, and machine learning provides a promising approach to understanding and predicting biological systems’ behavior. By improving our ability to predict how living organisms will behave, we can develop more effective therapies for diseases and make more informed decisions about managing conditions. I will also discuss examples of how machine learning has been applied to predict biological systems’ behavior in my talk at ODSC East 2023 in Boston.

About the author/ODSC East 2023 speaker on machine learning and biology:

Joshy George is a bioinformatics researcher with a Ph.D. in Bioinformatics from the University of Melbourne, Australia, and a Master’s in Computer Science from the Indian Institute of Science. With his background in data science and machine learning, Dr. George has co-authored over 100 peer-reviewed scientific articles, showcasing expertise in developing principled methods to solve complex biological problems. In his current role, he leads a team that is focused on building predictive models for cancer precision medicine and understanding the molecular mechanisms leading to diseases.

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.