Big Data 14 min read

NetEase Data Platform DataOps Practices for Improving Data Quality

This article details how NetEase's DataFunTalk presentation explores the growing data quality challenges in data development and demonstrates the application of DataOps principles—including pre‑ and post‑control mechanisms, sandbox environments, and automated governance tools—to systematically reduce defects, optimize resources, and ensure reliable data delivery across the company's diverse business lines.

DataFunTalk

Feb 23, 2022

NetEase Data Platform DataOps Practices for Improving Data Quality

Data development at NetEase faces increasing quality issues that affect delivery speed, cost, and security, prompting the need for DataOps‑driven process improvements.

The presentation outlines four main topics: the current quality problems, proactive (pre‑control) measures, reactive (post‑control) measures, and the data sandbox architecture.

Pre‑control techniques include SQL Scan for syntax and performance checks, shape inspection to verify table statistics and field integrity, data comparison to ensure consistency after logic changes, downstream impact analysis using data lineage, and controlled release workflows with mandatory approvals.

Post‑control features comprise baseline impact analysis to predict task completion times, baseline operations for early alerts, instance run overviews, critical path identification, output impact assessment via lineage, task freeze pools for rapid error containment, accelerators to prioritize critical jobs, and a Data Quality Center that monitors completeness, consistency, correctness, and timeliness of data.

The data sandbox provides isolated development and testing clusters sharing metadata and scheduling services, preventing test jobs from affecting production and enabling safe variable‑based database references.

Overall, NetEase's DataOps implementation combines automated testing, lineage‑based dependency management, and proactive monitoring to improve data reliability and operational efficiency across multiple business domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Data Platform DataOps

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.