How DROID-W Achieves Stable SLAM in Complex Outdoor Dynamic Scenes

DROID-W introduces a dense differentiable bundle‑adjustment SLAM framework that models per‑pixel dynamic uncertainty from multi‑view consistency, runs at about 30 FPS on an RTX 5090, and dramatically reduces trajectory error on challenging outdoor datasets, outperforming prior dynamic SLAM methods.

Machine Heart
Machine Heart
Machine Heart
How DROID-W Achieves Stable SLAM in Complex Outdoor Dynamic Scenes

Problem

Traditional SLAM assumes static scenes; moving pedestrians, vehicles, shadows, and reflections break this assumption and cause drift.

Limitations of prior dynamic SLAM

Previous dynamic SLAM methods rely on semantic segmentation or predefined object categories to mask moving objects, which limits applicability. WildGS‑SLAM introduces uncertainty‑aware mapping but still struggles with noisy outdoor data.

DROID‑W approach

DROID‑W does not predefine what may move. It exploits multi‑view observations to identify unreliable regions and automatically down‑weights them during optimization, enabling monocular SLAM on hand‑held dynamic footage.

System architecture

The pipeline proceeds as follows:

Select keyframes from the image stream.

Extract DINO visual features and DROID depth estimates.

Feed DROID features into a ConvGRU to predict dense pixel correspondences.

Perform dense differentiable bundle adjustment that jointly refines camera poses and depth.

Use the optimized pose, depth, and DINO features to estimate per‑pixel dynamic uncertainty.

Alternate the two optimizations, allowing large‑scale Gauss‑Newton updates while keeping online performance (~30 FPS on RTX 5090).

Dynamic uncertainty modeling

Dynamic uncertainty is measured by the similarity of DINO features across frames. A local affine mapping followed by a Softplus activation converts this similarity into a continuous uncertainty value. Pixels with high uncertainty receive reduced weight in the BA residuals, preventing dynamic regions from dominating the optimization.

Alternating optimization

Joint optimization of pose, depth, and uncertainty would be computationally prohibitive. DROID‑W therefore alternates: one stage optimizes pose and depth with the current uncertainty map; the next stage updates the uncertainty map based on multi‑view feature consistency. The loop repeats throughout the sequence.

Dataset

The authors built a new DROID‑W dataset comprising seven outdoor downtown sequences with RTK‑ground‑truth trajectories. The sequences contain high dynamics, over‑exposure, specular reflections, and sun glare. Additional YouTube videos were used to evaluate in‑the‑wild generalisation.

Results

On multiple dynamic benchmarks (Bonn, TUM, DyCheck) DROID‑W achieves the lowest trajectory error. On the newly introduced DROID‑W dataset the average trajectory error is 23 cm, compared with 1.46 m for the original DROID‑SLAM, demonstrating the benefit of uncertainty‑aware BA. Qualitative comparisons show accurate dynamic uncertainty maps where baseline methods fail.

Paper: https://arxiv.org/pdf/2603.19076

Project page: https://moyangli00.github.io/droid-w

Code: https://github.com/MoyangLi00/DROID-W

Dataset: https://cvg-data.inf.ethz.ch/DROID-W

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

uncertainty estimationSLAMBundle AdjustmentDynamic SLAM
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.