Big Data 12 min read

Data Governance at Didi: Interview with Liu Chao on Big Data Asset Management

In this interview, Didi data governance lead Liu Chao discusses his career journey, the unique technical architecture of Didi’s big‑data governance system, cost‑driven pricing models, metadata management, lineage extraction, automation practices, and offers practical advice for enterprises seeking effective data governance.

DataFunSummit
DataFunSummit
DataFunSummit
Data Governance at Didi: Interview with Liu Chao on Big Data Asset Management

In the era of a booming digital economy, extracting value from data assets has become a core competitive advantage for enterprises. Liu Chao, a master’s graduate of Nanjing University of Information Science and Technology and now working at Didi, shares his career path and the evolution of China’s big‑data industry.

The interview was conducted ahead of the DA Digital Intelligence Conference in Shanghai (April 25‑26, 2025), where Liu served as the producer of the session “Data Governance in the Era of Data Elements”.

Q: Could you share your career development path and the key experiences that led you to focus on big‑data governance?

Liu Chao: After graduation, my first job involved data‑search development, followed by data‑collection work. In the past five years I have been dedicated to big‑data governance. I stay in this field because I firmly believe in the huge value data can create in today’s era.

01 – The Uniqueness and Technical Implementation of Didi’s Data Governance System

Q: What are the unique aspects of Didi’s data governance system?

Liu Chao: Didi’s governance goes beyond offline data; it now covers all big‑data engines and major data products, and we have priced every engine and product. By using resource‑usage fees as a lever, we encourage users to actively engage in governance. Our asset‑management platform provides comprehensive visibility and tooling to make governance easy for users.

Q: Can you explain the pricing model that drives users to govern data?

Liu Chao: Each engine and data product is treated as an independent “commodity”. Costs consist of server, network, middleware, and labor. With a target profit margin, revenue = cost × (1 + margin). Pricing items are derived from engine characteristics, similar to cloud‑product pricing. Resource consumption is measured from runtime metadata.

Q: What are the core modules of the asset‑management platform (metadata collection, topology mapping, etc.)?

Liu Chao: All capabilities start with metadata collection. We aggregate metadata from all engines into a unified metadata warehouse and model it by cost, security, and quality domains. In the cost domain, usage is aggregated by person, account, and organization to produce dashboards and bills. Governance tools classify resources (e.g., empty tables, unreasonable lifecycles, heavy scans) and push them to owners for action.

Large‑model technologies such as LLMs are being explored for data development, analysis, and asset discovery, helping non‑technical users extract value from data.

02 – Technical Innovation and Differentiated Governance Strategies

Q: How do you balance performance, cost, and security when choosing technologies like Hadoop, Flink, and OLAP?

Liu Chao: The underlying approach is consistent: collect engine metadata, analyze it according to governance goals, and expose productized capabilities. Specific strategies differ: Hadoop focuses on storage (empty tables, lifecycle) and compute (scan, skew) issues, while Flink targets flow‑related problems such as idle tasks and low‑load jobs. The selection logic remains metadata‑driven.

Data lineage is a critical foundation. By parsing the logical plan of engines, Didi has greatly improved table‑level and field‑level lineage accuracy. The “Sync Center” links production and consumption flows to build end‑to‑end lineage.

Q: What challenges arise when parsing logical plans, especially with complex UDFs or JSON fields?

Liu Chao: We output the logical plan to logs and decouple parsing to avoid impacting production. Challenges included Scala reflection issues and handling complex UDFs that could not be serialized to JSON. We solved this by simplifying UDF nodes to retain only lineage‑relevant information and adding exception handling and timeout detection.

03 – Governance Path Recommendations for SMEs

Q: What advice would you give small‑to‑mid‑size enterprises starting their data‑governance journey?

Liu Chao: Treat governance as a business‑value initiative, not just a technical project. Define concrete benefits (e.g., 30% faster decisions, 20% lower development cost). Start with clear data standards, which are easier to establish early and provide a solid foundation as data volume grows.

04 – Governance Closed‑Loop and Automation Practices

Q: Has Didi achieved a closed‑loop governance (monitor‑>analyze‑>execute‑>feedback) and what automation has been validated?

Liu Chao: In cost governance we have a closed loop: budgets are broken down to project level, tracked monthly, and alerts are sent when spending exceeds limits. The “Governance Workbench” helps owners quickly address flagged resources and track outcomes. Automation includes offline table auto‑deletion based on access logs and lineage, removing tables that are unused and have no downstream dependencies.

The overall methodology demonstrates how a resource‑cost‑driven model and multi‑engine technical architecture can turn data assets from a cost burden into a value engine.

The article concludes with promotional information for the DA Digital Intelligence Conference, offering ticket discounts and contact details for registration and sponsorship.

Big DataAutomationdata lineagedata governanceDidimetadata managementCost-based Pricing
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.