NetEase Data Platform DataOps Practices for Improving Data Quality
This article details how NetEase's DataFunTalk presentation explores the growing data quality challenges in data development and demonstrates the application of DataOps principles—including pre‑ and post‑control mechanisms, sandbox environments, and automated governance tools—to systematically reduce defects, optimize resources, and ensure reliable data delivery across the company's diverse business lines.
Data development at NetEase faces increasing quality issues that affect delivery speed, cost, and security, prompting the need for DataOps‑driven process improvements.
The presentation outlines four main topics: the current quality problems, proactive (pre‑control) measures, reactive (post‑control) measures, and the data sandbox architecture.
Pre‑control techniques include SQL Scan for syntax and performance checks, shape inspection to verify table statistics and field integrity, data comparison to ensure consistency after logic changes, downstream impact analysis using data lineage, and controlled release workflows with mandatory approvals.
Post‑control features comprise baseline impact analysis to predict task completion times, baseline operations for early alerts, instance run overviews, critical path identification, output impact assessment via lineage, task freeze pools for rapid error containment, accelerators to prioritize critical jobs, and a Data Quality Center that monitors completeness, consistency, correctness, and timeliness of data.
The data sandbox provides isolated development and testing clusters sharing metadata and scheduling services, preventing test jobs from affecting production and enabling safe variable‑based database references.
Overall, NetEase's DataOps implementation combines automated testing, lineage‑based dependency management, and proactive monitoring to improve data reliability and operational efficiency across multiple business domains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
