Artificial Intelligence 12 min read

Evolution and Challenges of Perception in L4 Autonomous Driving

The article traces L4 autonomous-driving perception from early rule-based point-cloud methods through data-driven deep-learning models to emerging self-learning, multi-task systems, and highlights four key hurdles—model generalization and explainability, robust multi-sensor fusion, real-time compute limits, and proper uncertainty handling—calling for integrated AI, engineering, and data solutions.

Didi Tech
Didi Tech
Didi Tech
Evolution and Challenges of Perception in L4 Autonomous Driving

DiDi’s perception system relies heavily on machine learning and deep learning, yet solving the perception problem for L4 autonomous driving is not achieved simply by adopting the latest deep‑learning models.

The article outlines three progressive stages of perception development:

Stage 1 – Rule‑based point‑cloud segmentation and object tracking: Before deep learning, static obstacles and common traffic participants (vehicles, pedestrians, cyclists) were detected using handcrafted rules and point‑cloud segmentation, followed by tracking. This approach required expert‑designed rules and struggled with many real‑world scenarios.

Stage 2 – Large‑scale data labeling and deep learning: The introduction of deep learning dramatically improved perception performance. Large annotated datasets enable data‑driven model design, but deep‑learning models still face limits in accuracy, generalization, and interpretability, especially for safety‑critical autonomous‑driving applications.

Stage 3 – Scalable, self‑learning systems for long‑tail data: To handle fine‑grained recognition (e.g., unusual vehicles, plastic bags, pedestrian intent) and long‑tail cases, systems need strong scalability and self‑learning capabilities. The “90‑10 rule” is cited: the remaining 10 % of problems consume 90 % of effort. Emerging techniques such as multi‑task learning and AutoML are explored, but practical solutions are still evolving.

The article then discusses four major challenges that impede fully autonomous perception:

1. Defects of deep‑learning models: State‑of‑the‑art models lack sufficient generalization and explainability. They can be fooled by imperceptible adversarial noise and struggle with rare or out‑of‑distribution samples. Model forgetting and limited memory further hinder continuous improvement.

2. Multi‑sensor fusion: Different sensors (LiDAR, camera, millimeter‑wave radar) have complementary strengths and weaknesses. Effective perception requires careful fusion, precise calibration, and scalable architectures to combine these modalities.

3. Latency vs. limited compute: Autonomous vehicles operate under strict real‑time constraints and limited on‑board compute resources. Deploying complex models can increase latency, so techniques such as model compression, neural architecture search, and code optimization are essential.

4. Uncertainty representation: Perception outputs inherently contain uncertainty (e.g., fewer LiDAR points for distant objects). Current pipelines often ignore this uncertainty, which can jeopardize downstream planning and safety.

Finally, the article emphasizes that perception for autonomous driving is a system‑level problem that intertwines algorithms, engineering, and data. Progress requires integrating advanced AI methods, robust engineering practices, and large‑scale data pipelines. The piece concludes with a call for continued research to overcome the remaining challenges.

References to seminal works on autonomous driving, probabilistic segmentation, multi‑task learning, AutoML, and adversarial examples are provided at the end of the article.

computer visionaiDeep Learningautonomous drivingperceptionsensor fusion
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.