Earlier this week, Google’s DeepMind introduced RoboCat to the world, a new self-improving robotic agent that can learn to operate different robotic arms and solves tasks from self-generated data. According to the paper accompanying the announcement. RoboCat is able to solve complex tasks from as few as 100 demonstrations.
In the same paper, they highlight how RoboCat can learn various tasks across a series of different arms. Then, it self-generates new training data to create a cycle of improvement in how it operates. This builds off older research that explored developing robots that can learn to multitask at scale.
The goal of course is bridging the gap between real-world capabilities and language models. This advancement is coming at a time when more families are including robotic technology in their lives. Think of the famous Roomba robotic vacuum cleaner.
What RoboCat is showing is that, as advances in AI continue, they can be scaled with immerging robotic technology, providing users with greater efficiency and of course another step closer to creating a general-purpose robot. But when you look at RoboCat, you may wonder how it got its name. Well, it all starts with the multimodal model that powers the arm, Gato.
Spanish for “cat”, Gato is a multimodal model that can process language, images, and actions in both simulated and physical environments. This architecture gave the team an excellent starting point to work with a large training dataset of sequences of images and actions of various robot arms solving hundreds of different tasks.
And that’s how RoboCat is able to improve itself. During the first round of training, Google DeepMind placed RoboCat into what they call a “self-improvement” training cycle with a series of unseen tasks. There were five specific steps required for each new task.
According to Google’s DeepMind post, they are as follows:
- Collect 100-1000 demonstrations of a new task or robot, using a robotic arm controlled by a human.
- Fine-tune RoboCat on this new task/arm, creating a specialized spin-off agent.
- The spin-off agent practices on this new task/arm an average of 10,000 times, generating more training data.
- Incorporate the demonstration data and self-generated data into RoboCat’s existing training dataset.
- Train a new version of RoboCat on the new training dataset.
So what happens is that as RoboCat learns from tasks, it improves how it deals with new tasks introduced. From their paper, the first version of RoboCat was sitting at a 36% success rate when it came to previously unseen tasks. But the current iteration of RoboCat has been able to now double its success rate.
But it’s not just simple robotic arms that RoboCat is operating. The training is diverse with the AI learning to operate different robotic arms within a few hours. RoboCat was able to learn to use two-pronged grippers and move on to a three-fingered gripper which has twice the controllable inputs.
Overall, RoboCat is another example of how advancements in AI, particularly AI that can learn from real-world environments, can help in the advancement or robotics. So even though we’re not yet close to having general-purpose robots, or something along the lines of C-3P0, the foundations are being laid.
If you’re interested in seeing RoboCat in action, Google shared a video that you can watch below: