Artificial Intelligence 11 min read

How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory

LoGeR, a new long‑context geometric reconstruction framework from DeepMind and UC Berkeley, uses a hybrid memory module combining test‑time‑training (TTT) and sliding‑window attention (SWA) to enable feed‑forward 3D reconstruction over sequences of up to tens of thousands of frames, achieving state‑of‑the‑art accuracy on KITTI, VBR, 7‑Scenes, ScanNetV2 and TUM‑Dynamics benchmarks.

AI Frontier Lectures

Mar 16, 2026

How LoGeR Extends 3D Reconstruction to Thousands of Frames with Hybrid Memory

Problem Statement

Existing feed‑forward dense 3D reconstruction networks operate on short context windows (tens to a few hundred frames) and are trained on limited short‑term data. This creates two bottlenecks: (1) the quadratic cost of bidirectional attention restricts the usable context length, and (2) there is a severe scarcity of long‑range training data, preventing reliable reconstruction of city‑scale or minute‑level video sequences.

Method Overview

LoGeR processes video streams in overlapping blocks (e.g., 128‑frame segments) so that computation grows linearly with sequence length while preserving local geometric fidelity. The core is a hybrid memory module that combines:

Test‑Time‑Training (TTT) memory : a parametric fast‑weight mechanism that compresses global geometric cues (coarse shape, scene scale) across blocks, anchoring a global coordinate frame and mitigating scale drift.

Sliding‑Window Attention (SWA) : a non‑parametric, loss‑less attention window that propagates high‑resolution features from the previous block to the current one, ensuring fine‑grained geometric alignment.

Block‑wise Processing

Each block undergoes dense bidirectional attention for high‑quality local inference. After a block is processed, the TTT layer updates its fast weights with compressed geometry and applies them to the next block. SWA layers attend to tokens from the current and previous blocks (C^{m‑1} ∪ C^{m}) at only four network depths, keeping memory and compute overhead low.

Alignment and Global Consistency

A pure feed‑forward alignment step re‑projects predictions into a globally consistent coordinate system after each block, eliminating drift without requiring loop‑closure detection.

Training Curriculum

To cope with limited long‑context data, a progressive curriculum is used:

Start with 48‑frame sequences split into 4 blocks.

Increase block density to 12 blocks while keeping sequence length.

Scale up to 128‑frame context (20 blocks) using H200 GPUs.

During this schedule the model gradually shifts reliance from SWA to TTT, learning to store and retrieve global geometry.

Experimental Results

KITTI : Absolute Trajectory Error (ATE) reduced by >74 % compared to prior feed‑forward methods; LoGeR outperforms the strongest optimization‑based baseline (VGGT‑Long) by 32.5 %.

VBR (up to 19 000 frames) : Maintains consistent global scale, whereas baselines exhibit severe drift.

7‑Scenes (50–500 frames) : Beats state‑of‑the‑art low‑complexity methods (Point3R, CUT3R, TTT3R, StreamVGGT, VGGT, π³) in reconstruction quality and pose accuracy.

ScanNetV2 and TUM‑Dynamics : Achieves lower camera pose error than all compared methods.

Qualitative visualizations show stable reconstruction over 20 000‑frame sequences, preserving global structure while baseline methods drift.

Paper and Resources

Paper title: LoGeR: Long‑Context Geometric Reconstruction with Hybrid Memory

arXiv PDF: https://arxiv.org/pdf/2603.03269

Project page: https://loger-project.github.io/

Figures

Figure 1: Overview of LoGeR architecture

Figure 3: Hybrid memory module (TTT + SWA)

Figure 5: Qualitative long‑sequence reconstruction

Code example

收
藏
，
分
享
、
在
看
，
给
个
三
连
击呗！

deep learning 3D reconstruction hybrid memory LoGeR long-context memory

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.