Editor’s Note: Yair Weiss, PhD is a speaker for ODSC Europe 2022 this June 15th-16th. Be sure to check out his talk, “Scientific Discovery and Unsupervised Disentanglement,” there!
Johanne Kepler (1571-1630) was in some sense a tremendously successful data scientist. While many astronomers have worked hard over the years to gather accurate experimental data regarding the motion of planets, Kepler’s genius was in the way that he analyzed data collected by others. As many of us learned in our first introduction to the study of the solar system, “When Tycho Brahe suddenly died in 1601, all of his data was given to Johannes Kepler and it became his responsibility to finish Tycho Brahe’s work. For the next 11 years, Kepler investigated mathematical patterns in the data, making and testing hypotheses until he developed an even better understanding of the arrangement and movement of our solar system than anything that had gone before.”
In modern data science terminology, Kepler’s “investigation of mathematical patterns in the data” was a form of nonlinear dimensionality reduction. He realized that high dimensional observations (namely the motion of all planets in our solar system) form a one-dimensional manifold: they are actually a function of a single latent variable, which is the distance of the planet from the sun. The astronomical models that came before Kepler can also be seen as nonlinear dimensionality reduction. Ptolemy’s model had a latent dimension of two and was accurate enough to be used in navigation for many years but Kepler’s model was the first to recover the true mapping from the observations to the latent code. This allowed him to accurately calculate the distance of planets from the sun, just by observing their motions.
Given the amazing progress in Deep Learning in recent years, it is tempting to ask whether machines can automatically perform nonlinear dimensionality reduction and thus discover the laws of nature. This is particularly promising in scientific settings where the amount of available data is huge. While Kepler performed his dimensionality-reduction on a small number of high-dimensional measurements (a handful of planets), in modern settings we can easily obtain hundreds of thousands of high dimensional vectors that are assumed to lie in a low-dimensional manifold. Notable examples include the firing rates of neurons in a behaving animal’s brain and the expression of different genes in a large population of cells.
A naive approach to nonlinear dimensionality reduction is to train an autoencoder: a deep neural network that gets as input the high dimensional measurements (e.g. planetary motions), maps this input to low-dimensional latent variables, and then reconstructs the measurements from the low-dimensional representation. In recent years, the latent dimension learned by such deep auto-encoders on experimental data is increasingly being used for scientific discovery.
In my talk at ODSC Europe 2022, I will describe why the recent optimism for using deep auto-encoders in the context of scientific discovery and the laws of nature is largely unwarranted. There are both experimental and theoretical reasons to be skeptical of the discovered latent representations. I will relate this issue to what is known as unsupervised disentanglement in the deep learning literature. I will also show that other forms of machine learning, which are very different from auto-encoders, may enable us to make progress towards the ambitious goal of automatically discovering the laws of nature from high-dimensional measurements.
About the author/ODSC Europe 2022 Speaker:
Yair Weiss is a Professor of Computer Science at the Hebrew University and the former Dean of the School of Computer Science and Engineering. His research interests include Machine Learning, Computer Vision and Neural Computation. He served as the program chair of the Neural Information Processing Systems conference (2004) and the European Conference on Computer Vision (2018). From 2004-2019 He was a Senior Fellow of the Canadian Institute for Advanced Research and he is currently a Fellow of the European Laboratory for Learning and Intelligent Systems. With his students and colleagues he has received best paper awards at UAI, NIPS, CVPR and ECCV.