Google DeepMind just dropped Gemini Robotics-ER 1.6, an embodied reasoning model that does something most AI can't: understand the physical world well enough to direct actual robots. The model handles spatial reasoning, counts objects in scenes, reads pressure gauges and sight glasses, and checks whether a task succeeded. It sits above a robot's existing systems, calling tools like Google Search and vision-language-action models to orchestrate actions. According to DeepMind researchers Laura Graesser and Peng Xu, it beats both its predecessor and Gemini 3.0 Flash in pointing accuracy, counting, and success detection.

The instrument reading capability came from working directly with Boston Dynamics. Hyundai, which owns Boston Dynamics, is already pushing this into factory environments where robots need to interpret complex physical setups. That partnership matters.

Real hardware running in real facilities, with DeepMind's reasoning layer telling Spot and similar robots what they're looking at and what to do next.

But inference latency remains a bottleneck. The model can synthesize Python scripts for computer vision tasks on the fly. Running them fast enough for fluid, responsive robot behavior is still hard. Faster inference could eventually let robots simulate potential futures before acting, generating images of what might happen and choosing the best path. The latency problem keeps that theoretical for now.

Developers can access Gemini Robotics-ER 1.6 now through the Gemini API and Google AI Studio, with a Colab notebook showing example configurations for embodied reasoning tasks.