Existing computational pipelines for quantitative analysis of high‐content microscopy data rely on traditional machine learning approaches that fail to accurately classify more than a single dataset without substantial tuning and training, requiring extensive analysis. Here, we demonstrate that the application of deep learning to biological image data can overcome the pitfalls associated with conventional machine learning classifiers. Using a deep convolutional neural network (DeepLoc) to analyze yeast cell images, we show improved performance over traditional approaches in the automated classification of protein subcellular localization. We also demonstrate the ability of DeepLoc to classify highly divergent image sets, including images of pheromone‐arrested cells with abnormal cellular morphology, as well as images generated in different genetic backgrounds and in different laboratories. We offer an open‐source implementation that enables updating DeepLoc on new microscopy datasets. This study highlights deep learning as an important tool for the expedited analysis of high‐content microscopy data.
A deep convolutional neural network (DeepLoc) is trained to classify protein subcellular localization in GFP‐tagged yeast cells using over 21,000 labeled single cells.
DeepLoc outperformed previous SVM‐based classifiers on the same dataset.
DeepLoc was used to assess a genome‐wide screen of GFP‐tagged yeast cells exposed to mating pheromone and identified ˜300 proteins with significant localization changes.
DeepLoc can be effectively applied to other image sets with minimal additional training.
Advances in automated image acquisition and analysis, coupled with the availability of reagents for genome‐scale perturbation, have enabled systematic analyses of cellular and subcellular phenotypes (Mattiazzi Usaj et al, 2016). One powerful application of microscopy‐based assays involves assessment of changes in the subcellular localization or abundance of fluorescently labeled proteins in response to various genetic lesions or environmental insults (Laufer et al, 2013; Ljosa et al, 2013; Chong et al, 2015). Proteins localize to regions of the cell where they are required to carry out specific functions, and a change in protein localization following a genetic or environmental perturbation often reflects a critical role of the protein in a biological response of interest. High‐throughput (HTP) microscopy enables analysis of proteome‐wide changes in protein localization in different conditions, providing data with the spatiotemporal resolution that is needed to understand the dynamics of biological systems.
The budding yeast, Saccharomyces cerevisiae, remains a premiere model system for the development of experimental and computational pipelines for HTP phenotypic analysis. A key high‐quality resource for yeast imaging experiments is the open reading frame (ORF)‐GFP fusion collection (Huh et al, 2003) which consists of 4,156 strains, each expressing a unique ORF‐GFP fusion gene, whose expression is driven by the endogenous ORF promoter. The GFP‐tagged yeast collection has been used to assign 75% of the budding yeast proteome to 22 distinct localizations under standard growth conditions, using manual image inspection. Several studies have since used the collection to quantify protein abundance changes and to map protein re‐localization in response to various stress conditions, again using manual assessment of protein localization (Tkach et al, 2012; Breker et al, 2013).
More recently, efforts have been made to develop computational methods for systematic and quantitative analysis of proteome dynamics in yeast and other cells (Breker & Schuldiner, 2014; Grys et al, 2017). For example, our group classified images of single yeast cells from screens of the ORF‐GFP collection into one or more of 15 unique subcellular localizations using an ensemble of 60 binary support vector machine (SVM) classifiers. Each SVM classifier was trained on manually annotated sample images of single cells, with a training set containing > 70,000 cells in total. Overall, this classifier ensemble (ensLOC) performed with > 70% precision and recall, providing a quantitative localization output not achievable using manual assessment (Koh et al, 2015). The ensLOC approach also outperformed earlier automated methods also based on SVMs for classifying the ORF‐GFP fusion collection (Chen et al, 2007; Huh et al, 2009).
Attempts to apply the ensLOC classifiers to new microscopy datasets involved a significant amount of re‐engineering and supplemental training. This problem reflects limitations associated with the image features used to train the classifiers. Typically, single cells are segmented from the images and hundreds of measurements representing pixel intensity statistics and patterns are computed for each cell (Chen et al, 2007; Dénervaud et al, 2013; Loo et al, 2014; Chong et al, 2015; Lu & Moses, 2016). The high dimensional feature space is then reduced by selecting relevant features for the classification task or using dimensionality reduction techniques prior to training a classifier (Liberali et al, 2014; Kraus & Frey, 2016). These segmentation and feature reduction techniques are typically not transferable across datasets, thereby requiring researchers to tune and re‐train analysis pipelines for each new dataset.
Deep learning methods have the potential to overcome the limitations associated with extracted feature sets by jointly learning optimal feature representations and the classification task directly from pixel level data (LeCun et al, 2015). Convolutional neural networks in particular have exceeded human‐level accuracy at the classification of modern object recognition benchmarks (He et al, 2015) and their use is being adopted by the biological imaging field. Recently, deep learning has been applied to the classification of protein localization in yeast (Kraus et al, 2016; Pärnamaa & Parts, 2016), imaging flow cytometry (Eulenberg et al, 2016), as well as the classification of aberrant morphology in MFC‐7 breast cancer cells (Dürr & Sick, 2016; Kraus et al, 2016). In addition, recent publications report that feature representations learned by training convolutional networks on a large dataset can be used to extract useful features for other image recognition tasks (Razavian et al, 2014; Pawlowski et al, 2016), and that previously trained networks can be updated to classify new datasets with limited training data, a method referred to as “transfer learning” (Yosinski et al, 2014).
Here, we demonstrate that the application of deep neural networks to biological image data overcomes the pitfalls associated with conventional machine learning classifiers with respect to performance and transferability to multiple datasets. We offer an open‐source implementation capable of updating our pre‐trained deep model on new microscopy datasets within hours or less. This model is deployable to entire microscopy screens with GPU or CPU cluster‐based acceleration to overcome the significant computational bottleneck in high‐content image analysis.
Training and validating a deep neural network (DeepLoc) for classifying protein subcellular localization in budding yeast
Toward our goal of building a transferable platform for automated analysis of high‐content microscopy data, we constructed a deep convolutional neural network (DeepLoc) to re‐analyze the yeast protein localization data generated by Chong et al (2015). We provide a brief overview of convolutional neural networks in Fig EV1 and refer readers to LeCun et al (2015) and Goodfellow et al (2016) for a more thorough introduction. To make a direct comparison of DeepLoc and ensLOC performance, we decided to train our network to identify and distinguish the same 15 subcellular compartments identified using the SVM classifiers (Fig 1A). We implemented and trained a deep convolutional network in TensorFlow (Abadi et al, 2015), Google’s recently released open‐source software for machine learning (Rampasek & Goldenberg, 2016). In DeepLoc, input images are processed through convolutional blocks in which trainable sets of filters are applied at different spatial locations, thereby having local connections between layers, and enabling discovery of invariant patterns associated with a particular class (e.g., nucleus or bud neck). Fully connected layers are then used for classification, in which elements in each layer are connected to all elements in the previous layer. Our network arranges 11 layers into eight convolutional blocks and three fully connected layers, consisting of over 10,000,000 trainable parameters in total (more detail in Materials and Methods, network architecture shown in Fig 1B). To ensure the validity of our comparative analysis, we trained DeepLoc on a subset of the exact same manually labeled cells used to train ensLOC (Chong et al, 2015), totaling ~22,000 images of single cells. However, instead of training a classifier on feature sets extracted from segmented cells, we trained DeepLoc directly on a defined region of the original microscopy image centered on a single cell, but often containing whole, or partial cells in the periphery of the bounding box. The use of these “bounding boxes” removes the sensitivity of the image analysis to the accuracy of segmentation that is typical of other machine learning classifiers. Despite using a substantially smaller training set than was used to train ensLOC (Chong et al, 2015) (~70% fewer cells), we found that training a single deep neural network using a multi‐class classification setting substantially outperformed the binary SVM ensemble when assigning single cells to subcellular compartment classes (71.4% improvement in mean average precision, Fig 1C).
The ensLOC method relied on aggregating across cell populations to achieve > 70% precision and recall in comparison with manually assigned protein localizations (Huh et al, 2003). To assess the performance of DeepLoc in a similar way, we aggregated cell populations by computing the mean for each localization category across single cells containing the same GFP fusion protein. Again, DeepLoc outperformed the binary classifier ensemble across all localization categories (Fig 1D), achieving a mean average precision score (area under precision recall curve) of 84%, improving on the classification accuracy of ensLOC by almost 15% with substantially less training input.
Visualizing network features
Having demonstrated the improved performance of DeepLoc over the analysis standard, we next investigated which components of our network were contributing to its success. One of the hallmark differences between deep networks and traditional machine learning is that the network’s learned representations are better at distinguishing between output classes than extracted feature representations used by other classifiers. To address whether this difference was relevant in our experiments, we visualized the activations of the final convolutional layer in 2D using t‐distributed stochastic neighbor embedding (t‐SNE) (Maaten & Hinton, 2008) for a single cell test set (Fig 2A). t‐SNE is a popular non‐linear dimensionality reduction algorithm often used to visualize the structure within high dimensional data in 2D or 3D space. Similarly, we visualized the CellProfiler (Carpenter et al, 2006)‐based features used to train the ensLOC SVM ensemble (Chong et al, 2015) on the exact same test set of single cell images (Fig 2B). We observed that using the DeepLoc representations, cells appeared to be better arranged in accordance with their localization classes, suggesting that DeepLoc’s convolutional layers learn to extract features that are meaningful in the distinction of protein subcellular localization. These results suggest that an important component of the improved performance of DeepLoc reflects the network’s ability to learn feature representations optimized directly on pixel values for a specific classification task as opposed to training classifiers on static feature sets.
2D t‐SNE (Maaten & Hinton, 2008) visualization of activations in the last convolutional layer of DeepLoc for 2,103 single cells in the test set. We computed the maximum activation across the spatial coordinates for each of the 256 features prior to fitting t‐SNE.
t‐SNE visualization of CellProfiler features extracted for the same cells. We normalized the 313 CellProfiler features to be in the range [0,1]. In these plots, each circle represents a single cell; circles are colored by their localization as determined by manual annotation (Huh et al, 2003) (color code to the right).
Filters and activations in the last convolutional layer of DeepLoc for sample input images containing GFP fusion proteins that localize to the bud neck (top), Golgi (middle), or nuclear periphery (bottom). The convolutional filter visualizations were generated by activation maximization (Yosinski et al, 2015). The maximally activated filter for each input is highlighted with a red box (bud neck at the top, Golgi in the middle, and nuclear periphery at the bottom). For the bud neck sample, the input patch, filter, and activation are presented together to visualize how features are activated in DeepLoc. Other input patches that also maximally activate the selected feature are displayed.
Regularized activation maximization (Yosinski et al, 2015) of output layers based on inputs initialized to leftmost column (Initialization). Different localization classes (compartment labels at the top of the images) are grouped by their morphological similarity (labels at bottom of images).
Next, we wanted to display these features to assess how they differ between compartment classes. To do this, we visualized activations and patterns extracted in the last convolutional layer of the network (layer 8) for specific input examples (Golgi, bud neck, nuclear periphery, Fig 2C, Materials and Methods). Different input patterns activated specific features in deeper convolutional layers (convolutional activations, Fig 2C), with representations being combined in the fully connected layers from the convolutional feature maps, ultimately producing unique signals for different input patterns. These signals differ by localization class in a biologically interpretable way. For example, images containing punctate subcellular structures like the Golgi (top panels, Fig 2C) activated similarly patchy, dispersed features, while images containing discrete compartments like the bud neck (middle panels, Fig 2C) activated features that appear localized and linear.
We extended our analysis by applying activation maximization (Yosinski et al, 2015) to visualize input patterns that maximally activate each output class (Fig 2D, see Materials and Methods). This technique works by keeping the parameters of the network constant while updating input pixel values to maximize the activation of specific features. In our implementation, the network iteratively updates an input with a randomly initialized green channel to produce an example “input” that resembles a cell with a GFP fusion protein that localizes to the maximally activated output class. The visualizations produced by the network for different output categories were convincing in their similarity to real compartment architecture. For example, visualizations for compartments such as the actin cytoskeleton, peroxisomes, and the spindle pole body were all punctate and dispersed (Fig 2D). Although these gener