Artificial Intelligence 19 min read

Technical Overview of DiDi's AR Indoor Navigation System

DiDi's AR indoor navigation system addresses GPS unreliability in large indoor venues by using SfM-based 3D reconstruction, robust visual localization with magnetometer/GNSS priors, and sensor fusion with pedestrian dead‑reckoning and deep‑learning heading estimation, cutting passenger pick‑up time by up to 25 % across dozens of airports and malls.

Didi Tech
Didi Tech
Didi Tech
Technical Overview of DiDi's AR Indoor Navigation System

In large indoor venues such as airports, malls and train stations, GPS signals are unstable, the area is vast and routes are complex, making it difficult for passengers to locate the pick‑up point after placing a ride‑hailing order. To address this, DiDi developed an AR‑based real‑world navigation product that combines 3‑D reconstruction, visual positioning and augmented‑reality techniques.

Application Background

User research showed that passengers often spend extra time finding the boarding point in indoor environments where GPS is inaccurate. DiDi first introduced an “image‑and‑text” guide, then explored a more intuitive AR solution, resulting in the DiDi AR navigation product.

Problem Analysis

The challenges include (1) building a map of the indoor scene, (2) determining the user’s position, and (3) providing an intuitive guidance method. Indoor GPS is unreliable; Wi‑Fi and cellular positioning suffer from large errors; the environment is large, repetitive and dynamic, making traditional outdoor navigation pipelines unsuitable.

Technical Challenges

Three core technical challenges were identified:

3‑D reconstruction for large indoor spaces.

Robust visual localization in repetitive, dynamic environments.

Accurate sensor‑based position estimation despite inertial drift.

Key Solutions

1. Vision‑based 3‑D Reconstruction

We adopt Structure‑from‑Motion (SfM) to recover scene geometry from multiple images or video streams. The pipeline includes data acquisition, feature extraction, data association, and bundle‑adjustment optimization. For large venues, we propose a block‑wise reconstruction method that builds an association graph, partitions it via a graph‑cut model, and merges blocks with pose‑graph optimization, achieving a 70 % efficiency gain and producing one of the largest indoor 3‑D models (e.g., Zhengzhou Airport).

2. Visual Localization

We rely on camera‑based visual positioning instead of GNSS, Wi‑Fi or Bluetooth. The pipeline extracts image features, retrieves top‑N candidate images from the 3‑D model, matches 2‑D to 3‑D points, and solves pose with RANSAC + PnP. To reduce mismatches caused by repetitive signs, we incorporate coarse priors from magnetometer and GNSS, perform clustering, and re‑rank candidates with weighted scores.

3. Sensor‑based Position Estimation

After visual pose is obtained, we fuse inertial sensor data (accelerometer, gyroscope, magnetometer) with a pedestrian dead‑reckoning (PDR) framework. We improve step detection using gait‑intensity thresholds, estimate stride length with statistical features, and refine heading with a deep learning model (LSTM + ResNet) that outputs a heading‑confidence estimate. A gradient‑boosted decision‑tree classifier distinguishes walking, idle and device‑shaking states to adapt the PDR parameters.

Summary

The AR navigation system has been deployed in more than 24 airports, malls and stations (e.g., Zhengzhou, Shenzhen, Tokyo). Field tests show a reduction of up to 25 % in the time needed to reach the pick‑up point, demonstrating the effectiveness of combining AI‑driven visual SLAM, sensor fusion and AR rendering for indoor ride‑hailing scenarios.

3D reconstructionsensor fusionAR navigationIndoor Positioningmobile localizationpedestrian dead reckoningvisual SLAM
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.