Artificial Intelligence 21 min read

Technical Overview of Didi’s MJO 3D Panoramic Navigation, Main/Sub‑Road Yaw Detection, and Deep‑Learning‑Based Navigation Engine

Didi’s Navigation system combines a novel MJO 3D panoramic map with advanced data‑compression and octree rendering, precise main/sub‑road yaw detection using LSTM‑based models trained on GPS and image data, and a lightweight deep‑learning engine optimized for mobile CPUs/GPUs, delivering accurate, real‑time guidance for ride‑hailing and autonomous driving.

Didi Tech
Didi Tech
Didi Tech
Technical Overview of Didi’s MJO 3D Panoramic Navigation, Main/Sub‑Road Yaw Detection, and Deep‑Learning‑Based Navigation Engine

Didi Navigation, a map product built on massive traffic data and advanced algorithms, serves both ride‑hailing and self‑driving scenarios. The product continuously collects driver feedback and optimizes the navigation experience. This article introduces three core technical topics: the industry‑unique MJO 3D panoramic navigation, main/sub‑road yaw detection, and the application of deep learning on the navigation engine.

1. MJO 3D Panoramic Navigation

Traditional 2D maps cannot accurately represent complex bridge structures, leading to lane‑level mis‑recognition. MJO navigation introduces a fine‑grained scene model at the same level as real‑world imagery, dramatically reducing the cost of reading maps. The main challenges are high model complexity and large data volume, which demand significant CPU/GPU resources. To run on a wide range of devices, the team applied extensive data‑compression techniques, including texture compression, shared‑resource extraction, model compression, binary format conversion, and secondary‑model filtering.

Rendering performance is further improved by adopting next‑generation graphics APIs (Metal and Vulkan). These APIs provide closer‑to‑hardware control, higher draw‑call capacity, and better multithreading, enabling smoother rendering of thousands of bridge models. An octree‑based scene management reduces rendering complexity from O(N) to O(log N), and material merging cuts draw calls from thousands to dozens.

The navigation line is generated by mapping 2D link sequences to MJO bridge sections, stitching the MJO link sequence, and feeding the result into the rendering engine where Bezier interpolation smooths the animation.

2. Main/Sub‑Road Yaw Detection

Yaw detection determines whether a vehicle deviates from the planned route. In parallel main/sub‑road scenarios, the problem is more challenging because the roads are adjacent. The team built a labeling pipeline that combines machine‑generated labels (based on trajectory‑road‑network connectivity) and image‑based labels (using driver‑uploaded images). Machine labeling provides high‑precision samples for clearly connected road networks, while image labeling covers ambiguous cases.

Two model families were explored: traditional rule‑based classification and supervised learning. Supervised models include XGBoost, DNN, CNN, and LSTM. LSTM achieved the highest accuracy (~97%) on trajectory quality classification, thanks to its ability to capture temporal dependencies.

Feature engineering combined raw GPS attributes (speed, heading, accuracy) with derived features (distance to road, angular deviation, integrated offset). Low‑importance features such as raw heading were removed to shrink model size for mobile deployment.

3. Deep‑Learning‑Based Navigation Engine

Traditional yaw detection relies on handcrafted rules and map‑matching, which are hard to maintain and often trade off accuracy for sensitivity. By introducing deep‑learning models on the client side, the system can automatically learn complex patterns from GPS and image data, improving both accuracy and responsiveness.

Mobile constraints (CPU, GPU, memory, binary size) were addressed by:

Running inference only when GPS deviation exceeds a threshold.

Choosing lightweight LSTM models (≈200 KB) that still achieve >95% accuracy.

Optimizing model libraries for ARM processors and pruning unnecessary operators.

Finally, the article outlines a roadmap toward a deep‑learning‑driven expert system that partitions yaw scenarios and trains specialized models, aiming for better maintainability and performance on mobile devices.

deep learningmobile optimization3D renderingnavigationGPS trajectorymap matching
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.