How DROID-W Achieves Stable SLAM in Complex Outdoor Dynamic Scenes
DROID-W introduces a dense differentiable bundle‑adjustment SLAM framework that models per‑pixel dynamic uncertainty from multi‑view consistency, runs at about 30 FPS on an RTX 5090, and dramatically reduces trajectory error on challenging outdoor datasets, outperforming prior dynamic SLAM methods.
Problem
Traditional SLAM assumes static scenes; moving pedestrians, vehicles, shadows, and reflections break this assumption and cause drift.
Limitations of prior dynamic SLAM
Previous dynamic SLAM methods rely on semantic segmentation or predefined object categories to mask moving objects, which limits applicability. WildGS‑SLAM introduces uncertainty‑aware mapping but still struggles with noisy outdoor data.
DROID‑W approach
DROID‑W does not predefine what may move. It exploits multi‑view observations to identify unreliable regions and automatically down‑weights them during optimization, enabling monocular SLAM on hand‑held dynamic footage.
System architecture
The pipeline proceeds as follows:
Select keyframes from the image stream.
Extract DINO visual features and DROID depth estimates.
Feed DROID features into a ConvGRU to predict dense pixel correspondences.
Perform dense differentiable bundle adjustment that jointly refines camera poses and depth.
Use the optimized pose, depth, and DINO features to estimate per‑pixel dynamic uncertainty.
Alternate the two optimizations, allowing large‑scale Gauss‑Newton updates while keeping online performance (~30 FPS on RTX 5090).
Dynamic uncertainty modeling
Dynamic uncertainty is measured by the similarity of DINO features across frames. A local affine mapping followed by a Softplus activation converts this similarity into a continuous uncertainty value. Pixels with high uncertainty receive reduced weight in the BA residuals, preventing dynamic regions from dominating the optimization.
Alternating optimization
Joint optimization of pose, depth, and uncertainty would be computationally prohibitive. DROID‑W therefore alternates: one stage optimizes pose and depth with the current uncertainty map; the next stage updates the uncertainty map based on multi‑view feature consistency. The loop repeats throughout the sequence.
Dataset
The authors built a new DROID‑W dataset comprising seven outdoor downtown sequences with RTK‑ground‑truth trajectories. The sequences contain high dynamics, over‑exposure, specular reflections, and sun glare. Additional YouTube videos were used to evaluate in‑the‑wild generalisation.
Results
On multiple dynamic benchmarks (Bonn, TUM, DyCheck) DROID‑W achieves the lowest trajectory error. On the newly introduced DROID‑W dataset the average trajectory error is 23 cm, compared with 1.46 m for the original DROID‑SLAM, demonstrating the benefit of uncertainty‑aware BA. Qualitative comparisons show accurate dynamic uncertainty maps where baseline methods fail.
Paper: https://arxiv.org/pdf/2603.19076
Project page: https://moyangli00.github.io/droid-w
Code: https://github.com/MoyangLi00/DROID-W
Dataset: https://cvg-data.inf.ethz.ch/DROID-W
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
