Watch: Effective Transfer Learning for NLP
ConferencesMachine LearningModelingNLP/Text AnalyticsTransfer Learningposted by ODSC Team July 24, 2019 ODSC Team
Transfer learning, the practice of applying knowledge gained on one machine learning task to aid the solution of a second task, has seen historic success in the field of computer vision. The output representations of generic image classification models trained on ImageNet have been leveraged to build models that detect the presence of custom objects in natural images. Image classification tasks that would typically require hundreds of thousands of images can be tackled with mere dozens of training examples per class thanks to the use of these pretrained reprsentations. The field of natural language processing, however, has seen more limited gains from transfer learning, with most approaches limited to the use of pretrained word representations. Other approaches use the mean, max-pool, or last output of sequence representations produced by RNN models as document representations, and learn lightweight models on top of these feature representions in order to leverage knowledge of previously trained NLP models. Unfortunately, in distilling sequence information down to a single fixed length vector per document via pooling, these methods sacrifice potentially useful information contained in the sequence representations.
[Related Article: The Best Machine Learning Research of 2019 So Far]
In this talk, by Madison May at ODSC East 2018, we explore parameter and data efficient mechanisms for transfer learning using sequence representations rather than fixed length document vectors as a medium for communication between models, and show practical improvements on real-world tasks. In addition, we demo the use of Enso, a newly open-sourced library designed to simplify the benchmarking of transfer learning methods on a wide variety of target tasks. Enso provides tools for the fair comparison of varied feature representations and target task models as the amount of training data made available to the target model is incrementally increased.