Ant Group’s Data Governance Practices: Overview, Data Quality, and Data Storage Governance
This article shares Ant Group's extensive experience in big data governance, detailing the overall data governance framework, data quality management, data storage governance, and future considerations, illustrated with practical cases and strategies for ensuring compliance, reliability, and cost efficiency.
The presentation outlines Ant Group's four‑part approach to data governance: an overview of governance concepts, data quality governance, data storage (计存) governance, and forward‑looking thoughts on the evolution of data governance.
Data Governance Overview – Ant focuses on five critical dimensions—architecture, security, compliance, quality, and value—to meet regulatory requirements (privacy, anti‑money‑laundering) and ensure data is usable, safe, and valuable across the enterprise.
Data Quality Governance – The discussion covers the sources of data (logs, DBs, messages, unstructured data) and the challenges posed by rapid business changes, high‑frequency financial data, and many stakeholder roles. A three‑layer architecture (capability, system, business) is introduced, with risk categories (technical engine, content, application) and concrete measures such as pre‑release testing, change‑gate controls, gray‑scale releases, and post‑incident audits. Key metrics (fault count, loss volume) drive continuous improvement.
Data Storage Governance – Ant’s 2019 storage utilization exceeded safe thresholds, prompting a shift to mixed‑deployment of offline warehouses and online resources. The strategy includes open‑source‑based resource sharing, tiered storage (hot, warm, archive, cold‑backup), and techniques like progressive computation, storage archiving, and data re‑partitioning. Results show a 50 % increase in warehouse elasticity and a 30 % reduction in storage consumption.
Future Directions – The speaker envisions integrated lake‑warehouse governance powered by large models, turning data from an internal product into a tradable commodity, and leveraging AI to automate risk detection and remediation.
The session concludes with acknowledgments of the speakers and organizers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
