Today’s voice assistants are getting more advanced, but they still usually require you to speak commands in certain ways and get confused as questions or orders get more in-depth. You can tell a smart speaker to set an alarm for 6:30 PM or ask it what year Abraham Lincoln gave the Gettysburg Address.
However, it’ll likely fail to understand the question if you ask for something with more details and context. In other words, you can’t yet talk to them as you would a human. Saying something like, “Can you help me plan my best friend’s bachelorette party for next summer while sticking to a modest budget?” wouldn’t get you anywhere.
However, that could change thanks to Meta AI’s announcement about Project CAIRaoke. Meta is the parent company associated with companies including WhatsApp, Facebook, and Instagram. As you might guess from the name, the AI arm works to improve artificial intelligence.
What Is Project CAIRaoke?
Project CAIRaoke seeks to improve how people interact with virtual assistants, such as voice-enabled smart speakers. Those gadgets currently require linking four components before the AI can function.
Natural language understanding (NLU) deals with the processing of sentences. Then, dialog state tracking (DST) allows a machine to interpret a person’s intentions based on the previous things spoken. There’s also a dialog policy (DP), which dictates how the machine responds based on the current conditions. Finally, natural language generation (NLG) centers on the programming that allows the machine to form its responses to users’ inputs.
Together, those four elements comprise the conversational flow that happens when a person speaks with an AI assistant. However, Project CAIRaoke is different because it does not set such parameters for how a conversation must proceed.
The previous training method meant creating and teaching the NLU aspect before moving on to DST, for example. Moreover, changing one module could break all the others, slowing down workflows.
Project CAIRaoke uses a neural network and only one set of training data for the AI. Moreover, it breaks the interdependence between the components mentioned above. As a result, the associated models can perceive everything in context, including recognizing different ways to say the same thing. The people working with the models can also prepare them faster and with less data.
What’s Next for Project CAIRaoke?
So far, the Meta AI team has tested the achievement on Portal, Facebook’s video-calling and assistant device. Experiments have indicated that the CAIRaoke neural network model lets people clarify their requests without repeating an entire phrase. If a person says, “Set an alarm for 5:30,” the AI will ask if they mean AM or PM in the next question. Early results showed that the new model performed better in such tasks than standard ones.
A blog post about the project clarified, “While our short-term implementation of the Project CAIRaoke model is for reminders on Portal, we hope to soon be utilizing it on much larger domains which will help personalize people’s shopping experiences, enable assistants to maintain context over numerous chats, and let people drive the flow of conversation.”
There was also a mention of how Project CAIRaoke could lead to augmented reality (AR) glasses that give fashion advice. A person might look at something in their closet and say, “What kind of pants go with this shirt?” Then, the glasses might show them a possible garment to buy online.
The blog also explained how developments associated with Project CAIRaoke could eventually change how people interact with their gadgets. That could bring improved navigation for VR games or support gesture-based commands.
How Could Improvements in Conversational AI Help Society?
Artificial intelligence has already had major impacts on former barriers. Consider the increased opportunities when businesses use AI translations in their operations. Research indicates it causes a resultant positive effect on trade revenues equivalent to a more than 35% reduction in the physical distance between nations.
The Project CAIRaoke blog post acknowledged the need to make the model work with many languages. If developers meet that goal, conversational AI that recognizes numerous languages could further streamline multinational trade.
It could also accelerate the successful rollouts of conversational AI at fast-food drive-through windows. In one recent example, Checkers and Rally’s restaurants completed the largest single implementation of such technology in that industry so far. The AI got added to 267 restaurants. A pilot program indicated that 98% of orders placed that way needed no human intervention.
However, it’s easy to imagine the variations that could arise when someone tries to order food, even if they want the same item. One person might say, “May I have a taco combo with a Coca-Cola for the drink, please?” The next consumer in line could state, “I’d like the tacos, and could you make it a combo with Coke?”
An AI voice assistant must successfully interpret those different ways of ordering the same food to be truly useful. Otherwise, consumers will get fed up. Plus, restaurant team members will get pulled away from their other duties if the AI can’t understand people’s orders.
Making AI More Appealing
Many people like the idea of saying phrases to smart speakers and getting help with everything from booking a taxi to finding a recipe. However, those person-to-machine interactions can quickly become annoying when users must remember the exact ways to phrase their questions.
This ongoing work with Project CAIRaoke could encourage more people to try AI assistants for the first time because they allow interactions that are more natural and enjoyable. Similarly, company leaders could realize that AI offers more potential use cases than they first envisioned, making conversational AI more applicable to various business environments and processes.
It’s too early to say whether Meta AI’s work in this area will have large and lasting impacts, especially since this project is in the early stages. However, the future looks promising.