Artificial Intelligence 19 min read

Technical Overview of DiDi's AR Indoor Navigation System

DiDi's AR indoor navigation system addresses GPS unreliability in large indoor venues by using SfM-based 3D reconstruction, robust visual localization with magnetometer/GNSS priors, and sensor fusion with pedestrian dead‑reckoning and deep‑learning heading estimation, cutting passenger pick‑up time by up to 25 % across dozens of airports and malls.

Didi Tech

Sep 10, 2020

Technical Overview of DiDi's AR Indoor Navigation System

In large indoor venues such as airports, malls and train stations, GPS signals are unstable, the area is vast and routes are complex, making it difficult for passengers to locate the pick‑up point after placing a ride‑hailing order. To address this, DiDi developed an AR‑based real‑world navigation product that combines 3‑D reconstruction, visual positioning and augmented‑reality techniques.

Application Background

User research showed that passengers often spend extra time finding the boarding point in indoor environments where GPS is inaccurate. DiDi first introduced an “image‑and‑text” guide, then explored a more intuitive AR solution, resulting in the DiDi AR navigation product.

Problem Analysis

The challenges include (1) building a map of the indoor scene, (2) determining the user’s position, and (3) providing an intuitive guidance method. Indoor GPS is unreliable; Wi‑Fi and cellular positioning suffer from large errors; the environment is large, repetitive and dynamic, making traditional outdoor navigation pipelines unsuitable.

Technical Challenges

Three core technical challenges were identified:

3‑D reconstruction for large indoor spaces.

Robust visual localization in repetitive, dynamic environments.

Accurate sensor‑based position estimation despite inertial drift.

Key Solutions

1. Vision‑based 3‑D Reconstruction

We adopt Structure‑from‑Motion (SfM) to recover scene geometry from multiple images or video streams. The pipeline includes data acquisition, feature extraction, data association, and bundle‑adjustment optimization. For large venues, we propose a block‑wise reconstruction method that builds an association graph, partitions it via a graph‑cut model, and merges blocks with pose‑graph optimization, achieving a 70 % efficiency gain and producing one of the largest indoor 3‑D models (e.g., Zhengzhou Airport).

2. Visual Localization

We rely on camera‑based visual positioning instead of GNSS, Wi‑Fi or Bluetooth. The pipeline extracts image features, retrieves top‑N candidate images from the 3‑D model, matches 2‑D to 3‑D points, and solves pose with RANSAC + PnP. To reduce mismatches caused by repetitive signs, we incorporate coarse priors from magnetometer and GNSS, perform clustering, and re‑rank candidates with weighted scores.

3. Sensor‑based Position Estimation

After visual pose is obtained, we fuse inertial sensor data (accelerometer, gyroscope, magnetometer) with a pedestrian dead‑reckoning (PDR) framework. We improve step detection using gait‑intensity thresholds, estimate stride length with statistical features, and refine heading with a deep learning model (LSTM + ResNet) that outputs a heading‑confidence estimate. A gradient‑boosted decision‑tree classifier distinguishes walking, idle and device‑shaking states to adapt the PDR parameters.

Summary

The AR navigation system has been deployed in more than 24 airports, malls and stations (e.g., Zhengzhou, Shenzhen, Tokyo). Field tests show a reduction of up to 25 % in the time needed to reach the pick‑up point, demonstrating the effectiveness of combining AI‑driven visual SLAM, sensor fusion and AR rendering for indoor ride‑hailing scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

3D reconstruction sensor fusion AR navigation Indoor positioning mobile localization pedestrian dead reckoning visual SLAM

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.