Everything that Siri, Alexa, and other voice assistants do is quite impressive – it’s hard to argue otherwise. Though, there are industries where generalist AIs aren’t enough; Siri won’t be able to answer domain-specific questions that a person in the transportation industry may have. This is why Wluper is developing powerful conversational AI. Where Siri and Alexa can answer a little bit of everything, their reach is still largely one-sided in a question-answer structure. Wluper’s advanced AI will be able to support more in-depth conversations in specific industries—domain-specific AI.
Read on for an interview with the team at Wluper on what conversational AI is, what makes it unique, and what goes into designing it.
What exactly is conversational AI?
Conversational AI (Dialogue Systems) is about building a computer system that can understand a user through natural language, reason about the concepts that were said, and plan the course of actions to be taken from the reasoning process. In practice, this means that a conversational AI system is vaguely separated into three components.
Firstly, the “perception” part, which combines Speech Recognition (ASR), Natural Language Understanding (NLU), and Dialogue State Tracking (DST). Secondly, the “reasoning” part which usually encompasses the Dialogue Management (DM) and Knowledge Management(KM). And finally, the system output component, which can be varied and usually also has a text-to-speech (TTS) component.
What sets your knowledge-based conversational AI apart from Alexa, Siri, etc?
The difference comes in at various stages of the whole pipeline of a Dialogue System, our key differentiators are on the “perception”- and “reasoning”- part.
Let us start with the “perception” part. Currently the Siris and Alexas of this world, including all the various platforms to build such a system, provide two main ingredients for the NLU: Intents and Entities. This leads to a very shallow understanding of what the user wants to say. Intents need to be predefined and so the user is limited to only a finite number of what the system can understand – this is not how conversation or language more broadly works. So what we are doing is building a different NLU pipeline that leads to a different Dialogue State Representation, which goes beyond Intents and Entities and gives the system much more flexibility and deep understanding.
Now the “reasoning” part. The reasoning part for existing systems is limited by the depth of understanding the “perception” part and therefore only allows for the decision tree type dialogue manager that we see in the real world now. A deeper understanding makes the reasoning part, of course, much harder, yet at the same time allows the system to perform a much bigger variety of tasks and answer much more precise questions from the user.
E.g. Suppose one has a database with restaurants, per entry it has the restaurant name, location, opening times, rating, dietary restrictions, etc. At the moment, a dialogue designer at Siri would have to specify different intents to access all the fields of a restaurant entry – this is not scalable. Our system has the purpose to make all the available data, APIs and actions naturally available to the user.
What is different about your NLP engine, and what was the process behind creating the knowledge-based AI?
To dive into the more nitty-gritty part of our NLP engine we need to go back to these Intents and Entities.
One of the key differences of our NLP engine is how we represent a Dialogue State, as mentioned above, Intents and Entities is how current NLP engines represent a dialogue. Our Dialogue Representation has a much more connected structure. What this means is that the role of each entity is also described (e.g. Location is a very typical type of entity and existing systems just recognize the location, we, on the other hand, are also interested whether it’s the starting location or end location, or why this location is being mentioned). Furthermore, our system also has an “understanding” of what a “Location” or “Restaurant” is, i.e. it understands what some of the typical properties of a location are. E.g. A restaurant serves food, has an opening time, can be cozy or posh etc. These two things give our Understanding much more depth.
The knock-on effect for the reasoning part is that it has much more signal and information to make the right choice for the user – that, of course, is also part of the magic sauce and makes our task hard.
Share a bit about the behind-the-scenes activity of Recurrent the Neural Networks and Convolutional Neural Nets you use in regards to deep learning.
Deep learning is, of course, a powerhouse behind many of the recent advances in machine learning – ours included. The key, however, is to know why you use deep learning over other methods and when to use deep learning and when not to use it. Which is exactly what guides our thinking and practical decisions, we are very thorough in evaluating methods and understanding analytically and qualitatively, whether to use methods. As it turns out, some neural networks work incredibly well for some of our use-cases.
In particular (here we are letting you in on a secret), computing similarity measures between strings, sentences etc. using convolutional neural networks seems to be doing a really good job! The intuition behind is something along these lines: We humans are pretty good at reading words that are spelled wrongly such as “tomtato” or “infromation” and convolutional networks are good the same thing – understanding the information regardless of its location.
What can you share in regards to creating knowledge-based NLP for others to get started?
In order for others to get started, it’s important to have a strong evaluation framework for understanding why your use-cases are not being solved and then to have good expertise with semantic representations and really get started from there. The next hurdle is then making it applicable in real-world scenarios and with real-world data.