Industry Insights 7 min read

How Didi Implements Full‑Chain Data Tiered Protection for Reliable Operations

Facing growing data‑driven pressures, Didi designed a full‑link data tiered protection framework that defines classification standards, integrates data levels across the entire pipeline, and applies concrete safeguards and tooling to improve resource allocation, backup reliability, and overall data reliability.

Didi Tech

Dec 26, 2018

How Didi Implements Full‑Chain Data Tiered Protection for Reliable Operations

Background

Didi, a highly data‑driven company, encountered increasing pressure on data assurance as its business rapidly expanded. Common challenges included insufficient resources, prioritizing system alerts, optimizing data lifecycle costs, preventing data loss during frequent upgrades, and balancing data development standards with business efficiency.

The solution was to implement a full‑chain data tiered protection system.

Tiering

1. Define Data Tiering Standards – The first step was to decide how to classify data. Didi evaluated data importance from the perspective of the end users who interact with the data. An initial draft of tiering standards was created and piloted on a single business line.

Analysts initially labeled metrics based on experience, but the distribution was unexpected: high‑priority tiers (T1, T2, T3) formed an inverted triangle because users assumed most metrics needed high protection. Didi then used actual access statistics to rank data by business impact, resulting in T1 + T2 covering 30% of metrics while satisfying 80% of access demand, making the tiered protection feasible.

After the pilot succeeded, a practical, executable tiering standard was solidified.

2. Full‑Chain Data Level Integration – Data flows through demand, production, channels, and processing, involving analysts, developers, DBAs, operations, architects, and warehouse teams. Each role sees a different data entity.

Once the tiering standards were set, Didi needed to propagate the data level across all pipeline stages. A virtual joint architecture group, formed during 2017 data‑governance practice, discussed and finalized a technical solution for level propagation, covering Hive tables, API queries, and inter‑system calls.

Tiered Protection

1. Protection Cases

A. Cluster Resource Allocation

Background: Rapid growth of product line A outpaced cluster scaling, leading to competition for resources. The solution involved:

Teams confirm data levels using the unified tiering standard.

The scheduler prioritizes tasks based on data level, allowing T1/T2 tasks to submit first.

Separate queues for different levels ensure T1/T2 tasks receive priority compute resources.

Alarm levels are set according to data level, triggering phone alerts for T1/T2 anomalies.

B. Core Data Backup

Background: Manual listing of important data for multi‑level backup was time‑consuming and error‑prone. Solution: Since each data item’s level is stored structurally, backup strategies can be automated and system‑wide.

2. Check List

Because the data pipeline is long, Didi created a comprehensive check‑list covering all stages, enabling consistent execution of tiered protection across teams.

3. Tool Construction

To reduce manual effort, Didi built product‑side tooling that automates most tiered‑protection tasks, integrating the data level information into operational tools.

4. Protection Capability Measurement

Previously, Didi assessed data‑protection capability only by counting incidents. With tiered standards and tool integration, the capability can now be quantified before, during, and after incidents, providing reliable guidance for planning.

5. Overall Framework

The final framework combines management‑side tiering processes and stability‑support with technical‑side tool integration and continuous optimization.

Conclusion

The goal of data tiered protection is to open a window for solving data‑management problems, linking people and processes across the data chain, and enabling machines to replace manual work effectively.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Governance Didi operational reliability data classification industry insights Tiered Protection

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.