Though neural networks are inspired by the way human brains operate, they aren’t quite the same. But, if Google’s new RT-2 Model works as promised, it might be a major step in the direction of human-like AI. Introduced by Google’s DeepMind, the model promises to learn from both web and robotics data which in turn translates this acquired knowledge into generalized instructions for robotic control.
In short, the goal would see a bridge in communication between humans and robots, but that’s not all. It would teach by putting words into action. So what exactly is RT-2? Well according to the team, it’s a vision-language-action, or VLA, model. It was developed by using transformer-based techniques and trained on both text and image data scraped from the web.
In their post, they said of the training, “RT-2 builds upon VLMs that take one or more images as input, and produces a sequence of tokens that, conventionally, represent natural language text…. we adapt Pathways Language and Image model (PaLI-X) and Pathways Language model Embodied (PaLM-E) to act as the backbones of RT-2.”.
So far, not all that interesting. But where RT-2’s innovation lies is in its ability to train robots. The model takes web data, concepts, and general ideas and then applies that knowledge to help inform robotic behavior. But how was RT-2 trained in its tasks?
Well according to Google’s post, “Each task required understanding visual-semantic concepts and the ability to perform robotic control to operate on these concepts. Commands such as “pick up the bag about to fall off the table” or “move banana to the sum of two plus one” – where the robot is asked to perform a manipulation task on objects or scenarios never seen in the robotic data – required knowledge translated from web-based data to operate.“.
Well if in essence teaches robots to understand and speak the language of their human operations. And this has been a difficult task as complex tasks and robots have had a bad relationship for some time. Much of this is due to the physical variables that robots must address, that one doesn’t see with their chatbot counterparts.
This requires them to get a grounding on abstract concepts and ideas. Something that many popular AI programs don’t need to concern themselves with. As mentioned above with the aid of models that provide a better understanding of their environment, we’re witnessing robotics benefit from advancements in AI.
All of this can potentially replace traditional methods of robotic training that required billions of data points related to the surrounding. This was both a time-consuming venture and resource taxing. So with RT-22s ability to transfer knowledge and concepts to robotic devices, we’ll likely see a greater push for adaptable robotic technology.
And with advancements in visual modeling, one could expect to see robotic technology continue to make rapid advancements thanks to AI.