Google DeepMind has released Gemini Robotics-ER 1.6, an upgrade to its reasoning-first model for physical agents. The announcement describes the model as a significant improvement over both Gemini Robotics-ER 1.5 and Gemini 3.0 Flash in spatial and physical reasoning capabilities — specifically pointing, counting, and success detection. It also introduces a new capability, instrument reading, developed through collaboration with Boston Dynamics.
The model is available to developers today via the Gemini API and Google AI Studio, with a developer Colab providing examples of configuration and prompting for embodied reasoning tasks.
Pointing as a foundation for spatial reasoning
The announcement gives considerable weight to pointing — the model’s ability to identify specific locations or objects in a scene — as a foundational capability for robotics. Pointing in this context is not simply object detection; the post describes it as supporting multiple reasoning modes including spatial reasoning (precision detection and counting), relational logic (comparing objects, defining from-to relationships), motion reasoning (mapping trajectories and identifying grasp points), and constraint compliance (identifying objects meeting specified conditions, like “every object small enough to fit inside the blue cup”).
According to the post, Gemini Robotics-ER 1.6 can use points as intermediate reasoning steps toward more complex tasks — for example, using points to count items in an image or to identify salient positions that inform mathematical operations for metric estimation. The framing treats pointing not as an end capability but as a primitive that other reasoning chains can call.
Success detection and multi-view understanding
The post identifies success detection — determining when a task is finished — as “a cornerstone of autonomy.” For a robot to operate without constant human supervision, it needs to know whether an action succeeded, whether to retry, or whether to move to the next step in a sequence. The announcement says this is genuinely hard: it requires sophisticated perception combined with broad world knowledge to handle factors like occlusions, poor lighting, and ambiguous instructions.
Multi-view reasoning compounds the difficulty. Most production robotics setups use multiple cameras — the post mentions overhead and wrist-mounted feeds as a typical configuration. A system must understand how different viewpoints combine into a coherent picture not just at a single moment but across time. Gemini Robotics-ER 1.6 advances multi-view reasoning to better handle multiple camera streams and the relationships between them, including in dynamic or occluded environments.
Instrument reading: a capability developed with Boston Dynamics
The new instrument reading capability is described as emerging from a specific real-world need surfaced by Boston Dynamics. Industrial facilities require constant monitoring of gauges, thermometers, pressure instruments, and chemical sight glasses. The Spot robot can visit instruments throughout a facility and capture images of them; the question is whether the vision-language model can accurately interpret what it sees.
The task is harder than it sounds. Gauges often have text describing units, multiple needles corresponding to different decimal places, and ambiguous tick marks. Sight glasses require estimating liquid fill levels while accounting for camera perspective distortion. Reading a gauge accurately requires precisely perceiving multiple inputs — needles, liquid levels, container boundaries, tick marks — and understanding how they relate.
The post describes how the model approaches this using what it calls “agentic vision”: a combination of visual reasoning and code execution. The model zooms into an image to read small details, uses pointing and code execution to estimate proportions and intervals, and then applies world knowledge to interpret the result. This chaining of visual, spatial, and symbolic reasoning is presented as the mechanism behind the model’s accurate gauge readings.
Safety improvements
The announcement states that Gemini Robotics-ER 1.6 is the lab’s safest robotics model to date. It demonstrates superior compliance with Gemini safety policies on adversarial spatial reasoning tasks relative to all previous generations. The post also notes “substantially improved capacity to adhere to physical safety constraints” — specifically, better decisions about which objects can be safely manipulated under physical or material constraints such as “don’t handle liquids” or “don’t pick up objects heavier than 20kg.”
DeepMind also tested the model against real-life injury reports, using text and video scenarios. The post reports that Gemini Robotics-ER models improve over baseline Gemini 3.0 Flash performance by 6% on text injury-risk perception and 10% on video injury-risk perception.
The combination of more capable spatial reasoning and improved safety compliance is presented as complementary rather than in tension — the model is described as more capable at reasoning about what is physically possible and more reliable at recognizing when acting on that capability would be unsafe.