Model Quality Assurance Practices at DiDi: Challenges, Solutions, and Evaluation
DiDi’s shift to machine‑learning‑driven ride‑hailing services revealed major QA challenges—data and feature quality, model verification, and API stability—prompting a four‑pillar framework and a unified “Strategy‑Center 1.0” platform to systematically monitor, evaluate, and improve model effectiveness, bias paths, and feature discovery.
In recent years, machine learning (ML) models have been increasingly deployed in industrial settings, including DiDi’s ride‑hailing services. As many online strategies shift from rule‑based algorithms to ML models, establishing a robust quality‑assurance (QA) system for these models has become a critical need for the quality team.
Background – DiDi has migrated several core services (car‑pool queue estimation, driver dispatch bad‑case detection, cancellation‑rate prediction, etc.) to ML models. Unlike traditional software, ML models are trained on large datasets and behave as black boxes, making testing difficult. The main challenges identified are:
Sample acquisition (e.g., sparse safety‑allocation models)
Data quality at massive scale
Feature quality (effectiveness and correlation)
Model‑effectiveness verification, which often relies on coarse business metrics.
Model Quality‑Assurance Scheme – The proposed QA framework focuses on four pillars: data quality, feature quality, model/algorithm quality, and model‑effectiveness evaluation. At the interface level, performance and stability of model APIs are also considered, together with security concerns for unsupervised deep‑neural‑network models.
Current Status at DiDi – The company’s QA efforts currently cover data quality, interface quality, model monitoring, and effect evaluation. Model monitoring is widely used, but feature‑quality metrics and user‑view evaluation still need improvement. A fragmented service landscape hinders a unified QA platform, but the upcoming “Strategy‑Center 1.0” platform will consolidate training and deployment pipelines, allowing the QA team to focus on systematic QA and feature‑quality assessment.
Model Effectiveness Evaluation Practice – Using the car‑pool ETD model as a case study, the team performed online‑to‑offline evaluation, multi‑dimensional metric collection, bias‑path modeling, and root‑cause analysis of bad cases. The workflow uncovered new influential features and highlighted four value points of the evaluation framework:
Scenario‑specific effectiveness measurement
Identification of bias‑critical paths
Discovery of potential new features
Quantification of negative impact factors
Conclusion – Model QA is still in an exploratory stage across the industry. Compared with traditional QA, it demands higher technical expertise and faces diverse application contexts. DiDi’s accumulated experience in online strategy testing and bad‑case mining gives it an advantage, but a systematic, platform‑based QA capability—especially for feature‑quality and user‑view evaluation—remains a future focus.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.