Ant Group's Data Governance Practices: Quality, Storage, and Future Directions
This article presents Ant Group's comprehensive data governance experience, covering data quality management, storage governance, architectural design, operational strategies, case studies, and forward‑looking thoughts on integrated lake‑warehouse governance, data value realization, and AI‑driven automation.
Ant Group shares the lessons learned from its large‑scale data governance practice, organized into four main parts: an overview of data governance, data quality governance, data storage (计存) governance, and future considerations.
Data Governance Overview – The company focuses on five critical aspects: architecture, security, compliance, quality, and value. These dimensions ensure that data supports core business operations while meeting regulatory and privacy requirements.
Data Quality Governance
1. Analysis of Data Quality Issues : Ant’s data sources include behavior logs, service logs, DB, message streams, and unstructured data. After ingestion into a unified big‑data platform, data undergoes batch and stream processing, and any quality problem (missing, delayed, or incorrect data) can impact downstream services.
2. Challenges : Rapid business changes, high‑frequency data updates, diverse stakeholder roles (BI, tech, data, product), and massive daily task volumes (thousands of changes, millions of task instances) make quality assurance essential.
3. Top‑Level Design : Risks are categorized into technical engine risk, content risk, and application risk. The governance framework includes three layers – capability, system, and business – covering quality control, testing, release management, monitoring, and emergency response.
4. Architecture (see image):
5. Key Practices : Pre‑release quality assurance, change risk control, rapid fault detection and recovery, and a joint data‑technology “blue‑team” to validate defenses.
Data Storage (计存) Governance
1. Challenges : 2019 offline cluster storage utilization exceeded 85%, leading to safety incidents. The scale reached exabytes with millions of tables and thousands of developers.
2. Core Ideas : Organizational design (data architecture group), standards (data architecture, development, governance rules), and platform construction (automated governance tools).
3. Strategy – Open‑source and cost‑saving approaches:
Open‑source: Share idle online DB resources with offline warehouses to boost elasticity.
Cost‑saving: Optimize storage via progressive computation, archival, and compression.
4. Progressive Computation (see image):
5. Storage Archiving (see image):
6. Tiered Storage – Data is classified into hot, warm, archive, and cold‑backup tiers, each with specific hardware and redundancy configurations to balance performance and cost.
Future Thoughts on Data Governance
1. Integration : Move from isolated offline governance to lake‑warehouse‑centric governance covering online, offline, real‑time, and graph computing, leveraging large models and AI.
2. Value‑Oriented : Treat data as a tradable commodity, focusing on data rights, privacy protection, and monetization.
3. Intelligence : Incorporate large language models to automate rule generation, anomaly detection, and intelligent decision‑making.
Overall, Ant Group emphasizes a holistic, automated, and AI‑enhanced approach to data governance that balances compliance, cost, and business value.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
