Catherine Havasi, PhD, Helps Computers Uncover the Quirks of Language Catherine Havasi, PhD, Helps Computers Uncover the Quirks of Language
Dr. Catherine Havasi has no typical day. She bounces around between technology strategy, product research and development, and constantly talks to... Catherine Havasi, PhD, Helps Computers Uncover the Quirks of Language

Dr. Catherine Havasi has no typical day. She bounces around between technology strategy, product research and development, and constantly talks to customers in an effort to know what path to go down next. Now, she’s working to help computers undercover the quirks of language.

As co-founder and chief strategist of Luminoso, an AI-driven company that derives business solutions and insights from natural language understanding, Havasi continuously pushes the envelope of language technology. Luminoso digs into messy textual data, resurfacing with newfound wisdom. A marketing agency might adopt Luminoso’s software to investigate the roots of polarized sentiment. A healthcare company might use it to tease out popular concern about vaccinations. Beyond her role at Luminoso, Havasi also conducts ongoing research at the MIT Media Lab, where she focuses on “giving computers common sense and coming up with algorithms that make use of that common sense.”

In between Luminoso’s initiatives, research projects, product development and more, Havasi’s calendar brims with educational work on AI and a revolving door of speaking engagements. I recently had a chance grab some time from Havasi’s schedule and talk with her about her past, present, and future in the NLP industry.

It’s all connected

In her personal and professional circles, Havasi’s interest in Marvin Minsky’s 1986 book The Society of Mind is no secret. She read the book in high school and has considered Minsky one of her greatest inspirations ever since. During her undergraduate years at MIT, Havasi had the honor of working with Minsky on the Open Mind Common Sense project — a crowdsourced knowledge base fueled by thousands of online contributors who inform computers about the ins and outs of ordinary life. Minsky’s book is more about concepts than concrete science, but it altered the way many people thought of human intelligence, including Havasi, and helped position Minsky among the so-called “founding fathers of artificial intelligence.

For Havasi, The Society of Mind specifically upended the perception that it’s necessary to compartmentalize certain types of knowledge and functionality. “A lot of what was being done in machine learning was looking at modeling individual processes. I saw The Society of Mind as more holistic. Everything works together, everything interacts. It was very different from a lot of the stuff we saw in the 90s,” Havasi explained.

Her commitment to multifaceted, cross-disciplinary computer science only strengthened during her doctoral years studying under renowned computational linguist James Pustejovsky at Brandeis University. Havasi grew passionate about finding ways to nourish connections between seemingly disparate components and weave them into a cooperative, intelligent “society” of moving parts. Out of this passion bloomed Luminoso, which fuses the rote power of AI with a deeper comprehension of meaning and relationships among words.


Luminoso logo (Image source: Luminoso)

Building knowledge

With nearly any NLP task comes a balancing act between computer science and linguistics. Luminoso is first and foremost an AI company, but Havasi makes it clear that the linguistics is far from an afterthought. “We very much bring linguistics into it. Maybe not what you would consider rule-based stuff, but we have a linguistics team here. We’re doing real NLP, not just statistics,” Havasi assured.

Caring about things like, say, proper nouns in Russian may seem tedious in today’s fast-moving AI space, but paying attention to detail can result in major gains in accuracy. At the end of the day, Havasi describes herself as a “lexical resource person” — someone who draws upon richly textured databases of language — and that’s where she finds herself most rooted in linguistics.

Luminoso uses ConceptNet, a semantic-focused spinoff of the Open Mind Common Sense project, as its main lexical resource. The company builds upon this multilingual knowledge graph to find clarity within an immense trove of qualitative data that runs the gamut from structured surveys to social media. Lexical resources can be defined as “collections of lexical items, typically together with linguistic information and/or classification of these items.” In Havasi’s eyes, lexical resources unlock the potential to parse knowledge about how the world works and use such knowledge to create efficient machine learning systems.

When considering up and coming tech that inspires her, Havasi waxed poetic about a notable approach from a recent North American Chapter of the Association for Computational Linguistics (NAACL) benchmark competition. This particular system didn’t simply entail dumping the relevant data in a neural net and going from there. Instead, a knowledge base helped manipulate the system’s networks. The result wasn’t a blank slate that slowly constructed a web of understanding — the system came to the table with its own repository of knowledge already, further enriched through machine learning. Havasi insists that, until now, this groundbreaking meld of learning and drawing upon stored knowledge was hardly being implemented by anyone outside of her sphere. The NAACL demo was a bright point for her in a highly saturated field.

“It’s important that having knowledge matters. Just putting data in a bucket and hitting a button isn’t enough,” Havasi said point-blank.


ConceptNet visualization (Image source: guile3d.com)

Remember the customer

Fears of a so-called “AI winter” have begun to ripple throughout the artificial intelligence community. Havasi, however, remains confident that technology grounded in real-world problem-solving will persevere. “Five years from now, we’ll still be solving voice of the customer. AI companies must be focused on individual practical value. Things that aren’t making real impact are going to have problems. As long as AI has real impact and it’s helping business, then it won’t get lost in the hype,” said Havasi.

How does this affect KPIs? What about the bottom line? What bugs can be fixed that will save the most money? These are just a sample of the questions that Luminoso’s intelligence systems interrogate on behalf of their customers — the kind of questions that can always benefit from more answers.

As for where she sees Luminoso in five years, “We’ll be bigger!” Havasi declared. “We’re always trying to automate or optimize some system, get to conclusions at scale that weren’t there before.”

With more and more textual data being amassed, the business terrain for companies like Luminoso constantly charts new ground that could amplify voices which may have previously gone unheard. By riding the cutting edge in AI while staying true to the data’s linguistic core, Havasi generates software and research that reminds us why statistics alone are just one part of the NLP story. We can expect that the Luminoso of the future will be not only bigger, but even better.

Stay up to date on all things data science here.

Kaylen Sanders, ODSC

I currently study Computational Linguistics as an M.S. candidate at Brandeis University. I received my Bachelor's degree from the University of Pittsburgh where I explored linguistics, computer science, and nonfiction writing. I'm interested in the crossroads where language and technology meet.