Douyin Group's Data Management Strategies: Enhancing Metric Stability and Reusability
This article outlines Douyin Group's approach to handling petabyte‑scale data, addressing metric inconsistencies, and improving data product agility through a four‑layer Volcano Engine platform, systematic indicator production‑management‑consumption cycles, organizational design, automation, and future plans for large‑model‑driven metric splitting.
The presentation introduces Douyin Group's data management challenges, emphasizing the need to improve metric quality and efficiency across its massive data platform, which now exceeds exabyte‑scale storage and 600 PB of metric assets.
Key challenges include inconsistent metric definitions, duplicated metric names, and fragmented consumption, which hinder unified data governance as the business matures.
Douyin leverages the Volcano Engine Intelligent Data Platform, organized into four layers: data engine, data construction management, data analysis applications, and solution/consulting services, providing an agile, real‑time data engine capable of processing billions of records with sub‑second latency.
The platform emphasizes two core product traits: agility—accelerating data collection, processing, and analysis—and ease of use—offering low‑threshold, no‑code tools that enable non‑technical users to create data portals and conduct A/B testing.
To resolve metric issues, a three‑layer technical solution is proposed:
Metric production: model design, data quality, and stability safeguards.
Metric management: improving efficiency and ensuring consistency.
Metric consumption: service‑oriented delivery of metrics to downstream applications.
The end‑to‑end workflow includes demand registration, metric reuse checks, decomposition into atomic metrics and modifiers, model creation, binding, delivery, and full‑link tracing to ensure traceability from model to consumption.
Organizational design assigns clear responsibilities: business owners define metric intent, data application teams implement models, and public‑layer data teams maintain shared data assets, ensuring accountability and data quality.
Consistency is enforced through strict decomposition standards, unique metric validation, and similarity checks for atomic metrics and modifiers.
Efficiency improvements involve a "focus‑core" philosophy, standardized decomposition manuals, metric trees, and batch scripts that automate repetitive tasks, reducing manual effort dramatically.
Explorations into large‑model‑driven automatic metric decomposition aim to generate atomic metrics and modifiers from table schemas, further streamlining the process.
For agile production, the team adopts a "produce‑first, manage‑later" approach for fast‑moving short‑video business needs, deploying ClickHouse tables directly and later retrofitting metrics for consumption.
Stability solutions address upstream management complexity and daily governance by establishing data layer standards, optimizing pipelines, and integrating daily monitoring, SLA agreements, and incident response mechanisms.
Metric consumption practices include building unified metric topics that act as virtual tables, offering low‑cost setup, fast metric discovery, cross‑cluster/data‑source analysis, and intelligent routing to optimal models.
Metric topic management provides hierarchical directories, fine‑grained permission control, and easy import/export, enhancing both efficiency and transparency for product, operations, and analysis teams.
Future work focuses on three pillars: standardized and automated metric production, large‑model‑assisted metric management, and integrated data architecture enabling one‑definition‑multiple‑consumption patterns.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
