Robotics and artificial intelligence are interconnected. Developing humanoid robots capable of lifting heavy loads with state-of-the-art sensors would be pointless without an intelligent system that allows them to understand their surroundings and act accordingly. Without AI, a modern robot would merely be a collection of sophisticated yet useless hardware. Advanced algorithms transform this raw power into machines capable of learning, optimizing their performance, and responding autonomously to challenges.
From ASIMO, Honda’s iconic robot from the 2000s, to Sophia, Tesla’s Optimus, and Figure, humanoid robotics have gradually integrated AI. However, the tech sector is still far from achieving machines that truly match the versatility of the human body. Despite their advancements, these robots still struggle in uncontrolled environments. Additionally, manipulating everyday objects remains a significant challenge.
Gemini Robotics: Google’s Effort to Integrate AI Into the Physical World
Meanwhile, in the digital realm, AI is progressing at a vastly different pace. AI models can now hold conversations that closely resemble those of humans. They can also pass exams with impressive scores and solve complex problems at unimaginable speeds. This contrast highlights that, while AI is advancing rapidly, its integration with robotics still has a long way to go.
These challenges have led to a new generation of AI models specifically designed for robotics. Determined not to be left behind, Google is already working on solutions aimed at advancing humanoid robots. The company’s focus is on Gemini 2.0, which now has two new versions designed to enhance interaction and control over these machines.
On one hand, Gemini Robotics emphasizes vision, language, and action, enabling direct control of robots and improving their responsiveness in dynamic environments. On the other hand, Gemini Robotics-ER is intended for robotics experts, providing them with the tools needed to develop and implement their own programs with advanced reasoning capabilities.

Google has identified three essential qualities that robots must possess to be truly useful to people:
- Generality. A good robot should perform predefined tasks, adapt to unexpected situations, and solve problems on the spot. It must navigate new environments, handle unfamiliar objects, and interpret instructions without needing prior training. According to internal tests, Gemini Robotics’ performance in unforeseen tasks is more than twice that of other state-of-the-art vision-language-action models.
- Interactivity. In a constantly changing world, robots must communicate naturally and respond to instructions in real time. Gemini Robotics understands commands given in everyday language and multiple languages, adjusting its behavior based on the conversation or environment. It continuously monitors its surroundings, adapting its actions in response to new commands or changes.
- Dexterity. Many tasks people handle effortlessly require precise motor skills, which most robots haven’t yet mastered. However, Gemini Robotics is capable of performing complex multi-step tasks that demand meticulous manipulation, such as folding origami or packing a snack into a Ziploc bag. As such, Google’s new AI model showcases exceptional dexterity.
Overall, Gemini Robotics excels at managing unforeseen tasks, with its generalization capabilities significantly outpacing those of other vision-language-action models. According to Google’s white paper, it can adapt to unprecedented scenarios and make decisions without prior training, bringing robots closer to true autonomy.
The company has designed Gemini Robots to work with several types of robots. Although it was primarily trained with ALOHA 2, a bi-arm platform, the AI model has also demonstrated its ability to control systems such as the Franka arms. This system is commonly used in laboratories and advanced humanoids like Apptronik’s Apollo. Its flexibility allows it to be applied in a wide range of areas, from industry to assistance.
There’s currently no scheduled date for the widespread deployment of Gemini Robotics and Gemini Robotics-ER. The technology remains under development, and access to these tools is limited to a small group of companies.
Google DeepMind is collaborating with Apptronik to build the next generation of humanoid robots. The companies are focusing on how to integrate these AI models into more advanced systems. Additionally, some trusted testers, including Agile Robots, Agility Robotics, Boston Dynamics, and Enchanted Tools, are already evaluating Gemini Robotics-ER. However, it’s unclear if access will be expanded in the future.
Meanwhile, Google DeepMind is also developing new security frameworks and benchmarks to assess the potential risks associated with AI in physical environments. This highlights that, while the project is progressing, a long journey is still ahead before this technology is available to the general public.
Images | Google DeepMind
View 0 comments