LoGeR: Google DeepMind & UC Berkeley Scale 3D Reconstruction to 19,000-Frame Videos

Researchers from Google DeepMind and UC Berkeley have published LoGeR (Long-Context Geometric Reconstruction), a feedforward 3D reconstruction system capable of processing video sequences up to 19,000 frames — a scale that exposes fundamental failures in all prior approaches. The work, led by Charles Herrmann and Junhwa Hur with direction from Deqing Sun, attacks two distinct barriers that have blocked long-video 3D reconstruction: a "context wall" created by quadratic memory costs in full-attention models, and a "data wall" where models trained only on short sequences fail to generalize to large-scale environments.

A chunk-based hybrid memory architecture sits at the heart of the system, combining two complementary mechanisms. Sliding Window Attention (SWA) provides lossless local memory, maintaining high-precision geometric alignment between adjacent video chunks without the information loss that degrades competing linear-memory systems like CUT3R and TTT3R. Test-Time Training (TTT) serves as a compressed global memory, continuously updating fast weights to prevent cumulative scale drift over long trajectories — the failure mode that causes prior streaming methods to lose coherence over kilometer-scale sequences. The result is sub-quadratic linear scaling that, according to the authors, bypasses rather than accepts the quality-versus-scalability trade-off that has constrained the field.

On the 19,000-frame VBR dataset — designed to stress-test systems at unprecedented scale — LoGeR delivers a 30.8% relative improvement over prior feedforward approaches, with the gap widening as sequence length increases from 1,000 to 19,000 frames. On the standard KITTI autonomous driving benchmark, LoGeR posts the best average Absolute Trajectory Error (ATE) of 18.65 among feedforward methods. On shorter indoor benchmarks, LoGeR cuts error by 90.3% against TTT3R at 1,000 frames on 7-Scenes and by 80.0% against the same baseline on ScanNet. Those are not close races.

One immediate caveat for practitioners: training code and models have not been released, pending internal approval at Google. A "reimplementation" is listed on the project page but not yet available. That limits near-term reproducibility and keeps LoGeR in research-milestone territory for now. Autonomous driving dataset construction, large-scale AR scene understanding, and kilometer-scale mapping are the obvious targets — and Google's own Street View operation gives DeepMind a direct commercial runway for exactly this capability. The paper is available as arXiv preprint 2603.03269.