Big Data 14 min read

NetEase Data Platform DataOps Practices for Improving Data Quality

This article details how NetEase's DataFunTalk presentation explores the growing data quality challenges in data development and demonstrates the application of DataOps principles—including pre‑ and post‑control mechanisms, sandbox environments, and automated governance tools—to systematically reduce defects, optimize resources, and ensure reliable data delivery across the company's diverse business lines.

DataFunTalk
DataFunTalk
DataFunTalk
NetEase Data Platform DataOps Practices for Improving Data Quality

Data development at NetEase faces increasing quality issues that affect delivery speed, cost, and security, prompting the need for DataOps‑driven process improvements.

The presentation outlines four main topics: the current quality problems, proactive (pre‑control) measures, reactive (post‑control) measures, and the data sandbox architecture.

Pre‑control techniques include SQL Scan for syntax and performance checks, shape inspection to verify table statistics and field integrity, data comparison to ensure consistency after logic changes, downstream impact analysis using data lineage, and controlled release workflows with mandatory approvals.

Post‑control features comprise baseline impact analysis to predict task completion times, baseline operations for early alerts, instance run overviews, critical path identification, output impact assessment via lineage, task freeze pools for rapid error containment, accelerators to prioritize critical jobs, and a Data Quality Center that monitors completeness, consistency, correctness, and timeliness of data.

The data sandbox provides isolated development and testing clusters sharing metadata and scheduling services, preventing test jobs from affecting production and enabling safe variable‑based database references.

Overall, NetEase's DataOps implementation combines automated testing, lineage‑based dependency management, and proactive monitoring to improve data reliability and operational efficiency across multiple business domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData PlatformDataOps
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.