Supervised learning is among the most powerful tools in data science but it requires a training dataset in which one knows the classes of the input features apriori. For example, a classification algorithm learns the identity of animals through training on a dataset of images that are labeled with the species of each animal. Unsupervised learning is applied when data is without labels, the classes are unknown or one seeks to discover new groups or features that best characterize the data.
In this video, Aedin Culhane, PhD, provides an overview of unsupervised learning algorithms, including dimension reduction and matrix factorization approaches that learn low-dimensional mathematical representations from high-dimensional data. There are numerous computational techniques within the class of matrix factorization, each of which provides a unique interpretation of the processes in high-dimensional data.
She aims to demystify matrix factorization approaches, including principal component analysis, correspondence analysis, and non-negative matrix factorization, in addition to newer approaches including t-SNE and autoencoders. Extensions to these approaches can be applied to simultaneously learn the structure and features in multiple data sets. Methods such as canonical correlations analysis, multiple factor analysis extract the linear relationships that best explain the correlated structure across datasets. Lastly, Aedin describes how we apply these approaches to tens of thousands of tumors to advance precision medicine in oncology.