Ant Group's robotics team Robbyant just released LingBot-Map, an open-source system for streaming 3D reconstruction. It uses a geometric context transformer to build 3D maps in real time, targeting robotics, AR, and autonomous navigation. The model runs at roughly 20 frames per second when processing 518x378 resolution video.

That's decent speed. But there's a hole. The team hasn't disclosed what hardware you need to hit those numbers. HN commenters caught this immediately. Without specs, nobody can say whether this needs an expensive GPU or a budget-friendly robot vacuum. For actual deployment, that matters a lot.

Robbyant sits under Ant Group's inclusionAI initiative, their AGI push focused on embodied AI. They've been busy. LingBot-Map joins a string of open-source projects under the LingBot brand: lingbot-world for world models, lingbot-va for robot control, lingbot-depth for spatial perception, and lingbot-vla as a vision-language-action foundation model.

518x378 is low. For robot decisions, maybe fine. For fine detail, nope. Is this a toy or a tool? We can't tell without hardware specs, and that's frustrating.

But the open-source commitment is real. If Robbyant keeps this pace, researchers have a full embodied AI stack to hack on. That's rare from a company this size.