The Promise of Retrofitting: Building Better Models for Natural Language Processing The Promise of Retrofitting: Building Better Models for Natural Language Processing
Editor’s note: Catherine is a speaker for the upcoming ODSC East 2019 this April 30-May 3! Be sure to check out her talk, “Adding... The Promise of Retrofitting: Building Better Models for Natural Language Processing

Editor’s note: Catherine is a speaker for the upcoming ODSC East 2019 this April 30-May 3! Be sure to check out her talk, “Adding Context and Cognition to Modern NLP Techniques.”

OpenAI’s Andrej Karpathy famously said, “I don’t have to actually experience crashing my car into a wall a few hundred times before I slowly start avoiding to do so.” But yet for all it’s potential, that’s not at all how machine learning tends to work these days. But there are a growing group of researchers and practitioners working to change that.

Many natural language tools aren’t always very good at learning about the world. Distributional semantics systems like Word2Vec, FastText, and Glove lack basic information about the way the world works – often referred to as common sense.  Words that appear in similar sentences, but have very different meanings – like ‘wide’ and ‘narrow’, or ‘Red Sox’ and ‘Yankees’ – confuse those systems.

[Related article: An Idiot’s Guide to Word2vec Natural Language Processing]

So how can these systems embrace the types of reasoning we humans use to make snap judgments on new problems?

If I asked you whether a giraffe would be happy and comfortable in the room or subway car you’re currently reading this article in, you’d immediately have an answer despite never thinking of this problem before. To do this, you’re reaching into your model of the world to understand what a giraffe is, including how tall it is, and then you’re inferring what that means for your specific space. This classic NLP problem is known as “Common Sense Reasoning,” from a paper by John McCarthy (who also coined the term “artificial intelligence”) in the late 1950s.

The idea behind common sense reasoning is that if we give computers a good backbone of knowledge about the world, they’ll be better equipped to handle novel tasks and learn faster. I’ve been working in this field since the late 1990s when I started the Open Mind Common Sense project, now called ConceptNet.  Common sense was a niche field in research 10 years ago, but with the advent of deep learning and other distributional semantics techniques, it’s quickly gained urgency.

Right before he passed away, Paul Allen’s AI Institute, AI2, started an initiative to address this issue. Mr. Allen said, “To make real progress in AI, we have to overcome the big challenges in the area of common sense.” Shortly afterward, DARPA’s David Gunning started the Machine Common Sense (MCS) project to fund additional research. Machine learning techniques that incorporate common sense, by helping models integrate the knowledge learned from large quantities of data, could represent the key to making ML techniques more flexible.

To use common sense with deep learning, one must connect the curated, organized information about the world – like ConceptNet – with previously unseen, domain-specific data, such as a set of documents to analyze. The best way to do that is a family of algorithms called ‘retrofitting,’ which were first published by Manaal Faruqui in 2015. The goal of retrofitting is to combine structure information like a knowledge graph (ConceptNet or WordNet, for example) with an embedding of word vectors, similar to Word2Vec. By modifying the embedding so that related concepts in the knowledge graph are related in similar ways in the embedding, we’ve applied knowledge-based constraints after training the distributional word vectors. The thinking is that connected terms in the knowledge graph should have vectors that are closer together inside the embedding itself.

[Related article: An Introduction to Natural Language Processing]

Since retrofitting was first introduced, there have been many modifications of the algorithm published – some even modify during training. However, retrofitting is more effective if it’s done after training, rather than during – hence, the name retrofitting. Other modifications look at how different types of relations should affect the embedding – for example, words that are antonyms should be farther apart (Mrkšić et al., 2016).

Because of their successes, retrofitting techniques are spreading. At SemEval, an annual NLP benchmarking competition, researchers submit systems to be blind-tested in semantic tasks. Retrofitting-powered systems won one task in 2017, and six of the twelve tasks in 2018. Other machine learning areas, such as robotics and vision, are also exploring retrofitting techniques.

The ConceptNet team at Luminoso has created an open set of pre-made, production quality word embeddings that have been built from combining ConceptNet with an ensemble of distributional semantics word vectors called ConceptNet Numberbatch. We hope it’s easy to use and we’ve provided a link below as well as some references talks to learn more about the algorithm. Retrofitting is a very active area of research.  Expect to see it evolve as the community finds new ways to make deep learning on language less unwieldy.



ConceptNet Numberbatch download:

Original retrofitting code download:

Allen Institute talk on Retrofitting:

Allen Institute talk on ConceptNet and Numberbatch:

Catherine Havasi

Catherine Havasi

Dr. Catherine Havasi is Chief Strategy Officer and co-founder of Luminoso, an AI-based natural language understanding company in Cambridge, MA. Previous to Luminoso, she directed the Digital Intuition group at MIT's Media Lab working on word embeddings, transfer learning, and language understanding. In the late 90s, she co-founded the Common Sense Computing Initiative, or ConceptNet, the first crowd-sourced project for artificial intelligence. ConceptNet has played a role in thousands of AI projects and will be turning 20 next year.