Google is training robots the way it trains AI chatbots


RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request.
Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking its robotic arm to decide what makes a good improvised hammer (it was a rock) and to choose a drink to give an exhausted person (a Red Bull). They also told the robot to move a Coke can to a picture of Taylor Swift. The robot is a Swiftie, and that is good news for humanity.
The new model trained on web and robotics data, leveraging research advances in large language models like Google’s own Bard and combining it with robotic data (like which joints to move), the company said in a paper. It also understands directions in languages other than English.
For years, researchers have tried to imbue robots with better inference to troubleshoot how to exist in a real-life environment. The Verge’s James Vincent pointed out real life is uncompromisingly messy. Robots need more instruction just to do something simple for humans. For example, cleaning up a spilled drink. Humans instinctively know what to do: pick up the glass, get something to sop up the mess, throw that out, and be careful next time.
Previously, teaching a robot took a long time. Researchers had to individually program directions. But with the power of VLA models like RT-2, robots can access a larger set of information to infer what to do next.
Google’s first foray into smarter robots started last year when it announced it would use its LLM PaLM in robotics, creating the awkwardly named PaLM-SayCan system to integrate LLM with physical robotics.
Google’s new robot isn’t perfect. The New York Times got to see a live demo of the robot and reported it incorrectly identified soda flavors and misidentified fruit as the color white.
Depending on the type of person you are, this news is either welcome or reminds you of the scary robot dogs from Black Mirror (influenced by Boston Dynamics robots). Either way, we should expect an even smarter robot next year. It might even clean up a spill with minimal instructions.
RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request. Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking…
Recent Posts
- One of the best AI video generators is now on the iPhone – here’s what you need to know about Pika’s new app
- Apple’s C1 chip could be a big deal for iPhones – here’s why
- Rabbit shows off the AI agent it should have launched with
- Instagram wants you to do more with DMs than just slide into someone else’s
- Nvidia is launching ‘priority access’ to help fans buy RTX 5080 and 5090 FE GPUs
Archives
- February 2025
- January 2025
- December 2024
- November 2024
- October 2024
- September 2024
- August 2024
- July 2024
- June 2024
- May 2024
- April 2024
- March 2024
- February 2024
- January 2024
- December 2023
- November 2023
- October 2023
- September 2023
- August 2023
- July 2023
- June 2023
- May 2023
- April 2023
- March 2023
- February 2023
- January 2023
- December 2022
- November 2022
- October 2022
- September 2022
- August 2022
- July 2022
- June 2022
- May 2022
- April 2022
- March 2022
- February 2022
- January 2022
- December 2021
- November 2021
- October 2021
- September 2021
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- September 2018
- October 2017
- December 2011
- August 2010