Any data scientist will tell you that one of the most challenging parts of natural language processing projects is the lack (or shortage) of training data. With deep learning, this has been semi-solved, but now the problem can be too much data—up to millions or even billions of training points. For the most part, the solution has been to pre-train models and then fine tune them to specific tasks. However, with Google’s new BERT program, the bridge between these two problems have been greatly reduced. BERT is a new state-of-the-art pre-trained model, making the fine tuning infinitely easier.
What data scientists should know
Whereas most pre-trained models are trained as either contextual or context-free, as well as unidirectional or bidirectional. The most important things for data scientists to know about Google’s BERT program is it’s incredible use of deep bidirectional, contextual training. Previous models generate a single word embedding representation for each part of the vocabulary. In making BERT bidirectional, however, it uses the context around a given word and starts from the very bottom of the neural network.
Bidirectionality really means BERT can learn more of the intricacies of human speech—a challenge NLP models have faced in the past—including words that have double meanings, predicting whether or not sentences go together, and answering questions. It’s also open sourced on GitHub, and able to be used through Colab. The ideas behind BERT aren’t necessarily new, it’s the first in its class to perform so well.
Finally, this technology is exciting for data scientists because of how fast and easy it is to manipulate—fine tune—for specific NLP tasks (if you even need to). BERT has been compared to other state-of-the-art processors (and humans) and scored better than them, with next to no task-specific training. This streamlines your work, reduces the number of hours you spend training individual models, and means you get to your results and next steps faster.
What decision makers should know
For decision makers, the decision to implement Google’s BERT is simple. First, it’s an open source project, which means implementing it to your specific problems and tasks is no extra cost to you. Second, it’s the latest and greatest technology, which, when you’re working on NLP problems, can be the difference between you and your competitor succeeding. Third, it streamlines processes your data scientists are currently doing slowly and often by hand.
This means your data scientists have more time to actually run the models and get results, faster. Quicker and better results results in one problem means you can move on to implementation of those results, and your company can start its next problem, in this same, more efficient way. NLP models are notoriously tedious and difficult to collect data for and train, so any software that saves time and money by speeding up the process is worth looking into.
For more information on Google’s BERT, read their paper here.