Tagged articles
255 articles
Page 2 of 3
DataFunSummit
DataFunSummit
Jul 15, 2023 · Big Data

Intelligent and Automated Data Quality Management in Big Data Systems

This article explores the challenges of data quality in mature big‑data environments and presents intelligent, automated approaches—including assertions, automatic detection, rule recommendation, link checking, and collaborative mechanisms—to embed quality checks throughout the data pipeline, improving efficiency and reliability.

Data GovernanceData ObservabilityData Quality
0 likes · 18 min read
Intelligent and Automated Data Quality Management in Big Data Systems
AntTech
AntTech
Jul 6, 2023 · Industry Insights

Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era

The article examines how high‑quality data, robust privacy protection, and blockchain‑enabled trust infrastructure are essential for unlocking the value of AI models, citing market forecasts, examples from smart‑car and fintech firms, and the growing Chinese big‑data market through 2026.

AIBig DataBlockchain
0 likes · 9 min read
Unlocking AI Value: Data Quality, Privacy, and Blockchain in the Smart Era
Data Thinking Notes
Data Thinking Notes
Jul 2, 2023 · Big Data

Mastering Data Governance: A Comprehensive Framework for Enterprise Success

This article outlines a complete data governance framework, detailing the five managerial domains—control, process, governance, technology, and value—along with strategies for data strategy, organizational structure, policies, processes, standards, quality, security, and platform tools, and highlights AI’s pivotal role in enhancing governance efficiency.

Big DataData GovernanceData Quality
0 likes · 10 min read
Mastering Data Governance: A Comprehensive Framework for Enterprise Success
php Courses
php Courses
Jun 30, 2023 · Artificial Intelligence

Notable Real-World Failures of Data and Machine Learning Algorithms Over the Past Decade

Over the past decade, numerous high‑profile incidents have shown that flawed data and machine‑learning algorithms can cause severe consequences, from legal mishaps with ChatGPT to biased medical diagnoses, inaccurate real‑estate pricing, and discriminatory hiring practices, underscoring the need for rigorous data validation and algorithmic fairness.

AI ethicsCase StudiesData Quality
0 likes · 3 min read
Notable Real-World Failures of Data and Machine Learning Algorithms Over the Past Decade
Tencent Cloud Developer
Tencent Cloud Developer
Jun 29, 2023 · Industry Insights

How WeChat Built a Scalable Security Data Warehouse: Architecture, Evolution, and Data‑Quality Practices

This article examines the origins, architectural evolution, storage choices, unified access layer, multi‑IDC synchronization, operational tooling, and data‑quality mechanisms of WeChat's security data warehouse, illustrating how centralized feature management and rigorous quality checks enable reliable, high‑performance security policy enforcement at massive scale.

ArchitectureData QualityFeature Management
0 likes · 16 min read
How WeChat Built a Scalable Security Data Warehouse: Architecture, Evolution, and Data‑Quality Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 21, 2023 · Big Data

Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform

This article details the background, architecture, challenges, and iterative improvements of Bilibili's real-time data quality monitoring platform, covering offline and streaming DQC, resource-efficient Flink designs, InfluxDB proxy integration, CQ table handling, operational safeguards, and future engineering plans.

Big DataData QualityFlink
0 likes · 22 min read
Design and Optimization of Bilibili's Real-Time Data Quality Monitoring Platform
Architects Research Society
Architects Research Society
Jun 6, 2023 · Fundamentals

Information Governance: Roles, Responsibilities, and Key Processes

This article explains information governance as a program that ensures data accuracy, completeness, consistency, accessibility, and security across an enterprise, outlines the three essential business‑oriented roles—Data Governance Committee, Data Steward, and Data Custodian—describes their duties, and details the key procedures, metrics, and relationships with corporate and IT governance.

Data QualityEnterprise Data Managementdata stewardship
0 likes · 11 min read
Information Governance: Roles, Responsibilities, and Key Processes
Data Thinking Notes
Data Thinking Notes
May 14, 2023 · Big Data

Why Data Governance Matters: Boosting Data Quality and Business Value

Data governance, the overarching framework for evaluating, guiding, and supervising an organization’s data lifecycle—from collection to utilization—ensures high data quality, compliance, and security, ultimately maximizing data value and supporting AI-driven initiatives, while distinguishing itself from data management and data control through a strategic, top‑down approach.

Big DataData GovernanceData Management
0 likes · 8 min read
Why Data Governance Matters: Boosting Data Quality and Business Value
DataFunSummit
DataFunSummit
May 13, 2023 · Big Data

Expert Interview on Data Governance: Core Domains, Challenges, and Future Trends

In this interview, three data‑governance experts from Tencent, ByteDance, and Alibaba discuss the fundamental processes, core domains such as metadata, data lineage, metric systems, data quality and security, the main challenges they face, and emerging trends like DataOps, AI‑driven automation, and privacy‑preserving technologies.

Data GovernanceData LineageData Quality
0 likes · 14 min read
Expert Interview on Data Governance: Core Domains, Challenges, and Future Trends
DataFunTalk
DataFunTalk
May 7, 2023 · Big Data

Data Standards and Data Quality: Concepts, Frameworks, Tools, and Case Studies

This article presents a comprehensive overview of data standards and data quality, covering core concepts and frameworks, practical tools and techniques, real‑world case studies, and a detailed Q&A that together illustrate how organizations can govern, measure, and improve the reliability of their data assets.

Data Qualitydata quality toolsdata standards
0 likes · 22 min read
Data Standards and Data Quality: Concepts, Frameworks, Tools, and Case Studies
Data Thinking Notes
Data Thinking Notes
Apr 25, 2023 · Operations

Why Data Quality Matters: A Practical Guide to Governance and Seven‑Dimensional Evaluation

This article explains why data quality is critical for businesses, outlines common data quality problems, their root causes, and presents a comprehensive governance framework—including monitoring rules, alerting, full‑link monitoring, and a seven‑dimensional evaluation model—to ensure high‑quality data delivery.

Big DataData GovernanceData Quality
0 likes · 12 min read
Why Data Quality Matters: A Practical Guide to Governance and Seven‑Dimensional Evaluation
DataFunSummit
DataFunSummit
Apr 23, 2023 · Fundamentals

Data Governance Practices and Implementation Path at Dipu Technology

This article presents Dipu Technology's comprehensive data governance methodology, covering construction paths, a typical enterprise digital platform framework, core governance components, practical case studies, and a Q&A session that together illustrate how businesses can design, implement, and sustain effective data governance across the organization.

Data CatalogData GovernanceData Management
0 likes · 19 min read
Data Governance Practices and Implementation Path at Dipu Technology
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 17, 2023 · Big Data

Comprehensive Guide to Data Governance and Data Asset Management

This article presents a detailed roadmap for enterprise data governance, covering business digitization goals, data governance construction, typical digital platform architecture, core governance actions, implementation pathways, data asset inventory techniques, and real‑world case studies to illustrate practical execution.

Big DataData Asset ManagementData Governance
0 likes · 18 min read
Comprehensive Guide to Data Governance and Data Asset Management
Data Thinking Notes
Data Thinking Notes
Apr 16, 2023 · Big Data

Mastering Data Asset Management: From Inventory to Value Realization

This article outlines a complete data asset management lifecycle—starting with data inventory, moving through governance, classification, responsibility, permission, and security, and culminating in value realization via basic services, profiling, and algorithmic models—providing practical guidance for building a robust big‑data platform.

Big DataData GovernanceData Quality
0 likes · 10 min read
Mastering Data Asset Management: From Inventory to Value Realization
Data Thinking Notes
Data Thinking Notes
Apr 9, 2023 · Big Data

Why Data Quality Is the Hidden Driver of Big Data Success

In the big‑data era, high‑quality data are essential for reliable analytics, and this article explains data‑quality concepts, key dimensions, analysis methods for missing values, outliers, inconsistencies and duplicates, as well as practical management practices to ensure data assets become a competitive advantage.

Big DataData GovernanceData Management
0 likes · 15 min read
Why Data Quality Is the Hidden Driver of Big Data Success
DataFunTalk
DataFunTalk
Apr 8, 2023 · Fundamentals

Data Governance Practices and Pathways: Insights from DeepEx Technology

This article outlines DeepEx Technology's comprehensive data governance methodology, covering construction paths, digital platform frameworks, core governance components, implementation steps, case studies, and a Q&A that together illustrate how enterprises can build reliable data assets, models, standards, and quality processes to unlock business value.

Data AssetsData Qualitydata modeling
0 likes · 21 min read
Data Governance Practices and Pathways: Insights from DeepEx Technology
Data Thinking Notes
Data Thinking Notes
Apr 5, 2023 · Big Data

Mastering Data Governance: From Challenges to End‑to‑End Solutions

This article explores the key problems data governance aims to solve, outlines a comprehensive governance framework, and details practical implementation steps—including tool integration, metadata management, lake‑in and lake‑out processes, and governance policies—to achieve a closed‑loop, value‑driven data ecosystem.

Big DataData GovernanceData Lake
0 likes · 13 min read
Mastering Data Governance: From Challenges to End‑to‑End Solutions
Aikesheng Open Source Community
Aikesheng Open Source Community
Apr 3, 2023 · Databases

SQL Quality Management with Open-Source SQLE: Insights from the 2023 DAMS China Data Intelligence Management Summit

The 2023 DAMS China Data Intelligence Management Summit in Shanghai featured a technical presentation by Zhang Shenbo on an open‑source SQLE solution for SQL quality control, covering multi‑database auditing, automated review workflows, and practical tips to reduce DBA workload and cross‑department communication.

Data GovernanceData QualityDatabase Management
0 likes · 3 min read
SQL Quality Management with Open-Source SQLE: Insights from the 2023 DAMS China Data Intelligence Management Summit
Data Thinking Notes
Data Thinking Notes
Apr 2, 2023 · Fundamentals

Transforming Bank Data: A Practical Guide to Data Governance and Quality Management

This article explains how modern commercial banks can turn massive operational data into a strategic asset by building a comprehensive data governance framework that addresses data standards, quality management, metadata, master data, and security, while outlining a six‑step methodology for continuous improvement.

BankingData GovernanceData Quality
0 likes · 18 min read
Transforming Bank Data: A Practical Guide to Data Governance and Quality Management
DataFunTalk
DataFunTalk
Mar 31, 2023 · Big Data

Bilibili's Big Data Development Governance Platform: Architecture, Challenges, and Strategies

This article presents an in‑depth overview of Bilibili’s big data development governance platform, detailing its architecture, the pain points of platform construction, data‑driven methodology, product strategy, and practical solutions for data integration, quality, cost, and security governance across large‑scale data operations.

BilibiliCost OptimizationData Quality
0 likes · 35 min read
Bilibili's Big Data Development Governance Platform: Architecture, Challenges, and Strategies
Data Thinking Notes
Data Thinking Notes
Mar 26, 2023 · Big Data

Why Data Governance Is the Key to Unlocking Your Data’s True Value

This article explains how effective data governance transforms raw data into a trusted enterprise asset, outlines common pitfalls such as backward and passive governance, and presents a structured, four‑phase approach—including organizational setup, standards, platform selection, and continuous operations—to successfully implement data governance at scale.

Big DataData GovernanceData Management
0 likes · 10 min read
Why Data Governance Is the Key to Unlocking Your Data’s True Value
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 22, 2023 · Fundamentals

How ByteDance Scales Data Governance: Challenges, Distributed Solutions, and Best Practices

This article examines ByteDance's data governance journey, outlining business, organizational, and cultural challenges, the six-stage evolution framework, real‑world case studies, and the shift from centralized to distributed autonomous governance to improve quality, security, cost, and team efficiency.

Big DataData GovernanceData Quality
0 likes · 18 min read
How ByteDance Scales Data Governance: Challenges, Distributed Solutions, and Best Practices
Data Thinking Notes
Data Thinking Notes
Mar 19, 2023 · Big Data

Why Data Quality Is the Key to Successful Big Data Initiatives

The article explains that while big data aims to boost organizational insight and innovation, its true value depends on high data quality, outlines industry standards, identifies technical, business, and management causes of poor quality, and proposes a three‑phase strategy of prevention, monitoring, and post‑improvement to ensure reliable data for decision‑making.

Big DataData GovernanceData Quality
0 likes · 21 min read
Why Data Quality Is the Key to Successful Big Data Initiatives
Data Thinking Notes
Data Thinking Notes
Mar 8, 2023 · Fundamentals

How BI Portals Transform Enterprise Data Governance for Scalable Analytics

This whitepaper explains why effective BI governance is essential for modern enterprises, outlines the key capabilities of data‑governance tools—including data quality, certification, usage statistics, classification, lineage, glossary, and lifecycle management—and shows how BI portals and data catalogs together enable scalable, user‑centric analytics.

AnalyticsBI governanceBI portal
0 likes · 12 min read
How BI Portals Transform Enterprise Data Governance for Scalable Analytics
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 3, 2023 · Fundamentals

Understanding Data Management Principles and Governance: Insights from DMBOK

This article explains the core principles, strategies, frameworks, and governance practices of data management based on DAMA's DMBOK, covering data lifecycle, value, leadership responsibilities, strategic planning, governance models, metrics, and implementation guidelines to help organizations derive business value from high‑quality data.

DMBOKData GovernanceData Management
0 likes · 17 min read
Understanding Data Management Principles and Governance: Insights from DMBOK
DataFunSummit
DataFunSummit
Feb 11, 2023 · Big Data

Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies

This article presents a comprehensive overview of the intelligent metadata‑driven data governance framework implemented by Southern Power Grid Yunnan, detailing its background, challenges, architectural design, key AI‑enabled technologies, practical case studies, and the resulting business value for the power industry.

AIData Qualityelectric power
0 likes · 14 min read
Intelligent Metadata Governance for Power Data: Background, Solution, Value and Case Studies
DataFunSummit
DataFunSummit
Feb 2, 2023 · Big Data

Data Governance Strategies: Concepts, Practices, and Case Studies

The article explains why data is a critical corporate asset, distinguishes narrow and broad data‑governance approaches, outlines strategic principles such as treating governance as a systematic, prioritized effort, and presents eight real‑world case studies from companies like Tencent, SF Tech, Huolala, and NetEase.

Case StudiesData Qualitymetadata management
0 likes · 7 min read
Data Governance Strategies: Concepts, Practices, and Case Studies
Data Thinking Notes
Data Thinking Notes
Jan 31, 2023 · Fundamentals

Mastering Data Governance: From Metadata to ETL in One Guide

This comprehensive guide walks you through the entire data governance ecosystem, covering metadata fundamentals, classification, maturity models, data standards, modeling, integration, lifecycle management, quality assurance, security, and ETL processes, all illustrated with clear diagrams and practical steps.

Data GovernanceData IntegrationData Quality
0 likes · 13 min read
Mastering Data Governance: From Metadata to ETL in One Guide
DataFunTalk
DataFunTalk
Jan 25, 2023 · Artificial Intelligence

Between Heaven and Earth: Reflections of an Algorithm Engineer

The article argues that algorithm engineers should move beyond a narrow focus on deep‑learning models, emphasizing the importance of system architecture, data quality, and thoughtful problem framing to break through performance plateaus in advertising and recommendation systems.

AdvertisingData QualitySystem Architecture
0 likes · 10 min read
Between Heaven and Earth: Reflections of an Algorithm Engineer
DataFunSummit
DataFunSummit
Jan 24, 2023 · Big Data

Building a Real-Time Data and User Profiling Architecture with Apache Doris at Zhihu

The article details Zhihu's data empowerment team's design and implementation of a low‑cost, high‑response real‑time data platform built on Apache Doris, covering real‑time business metrics, algorithm features, and user profiling, and explains the challenges, architectural choices, tooling, performance gains, and future directions.

Apache DorisData IntegrationData Quality
0 likes · 22 min read
Building a Real-Time Data and User Profiling Architecture with Apache Doris at Zhihu
Huolala Tech
Huolala Tech
Jan 16, 2023 · Big Data

How Leading Logistics Companies Master Data Governance for Cost and Stability

At the 2022 DataFun Summit, data governance experts from Huolala, Zhongtong, and SF Express shared comprehensive practices—including governance drivers, quality monitoring, model management, master data processes, platform architecture, cost control, and stability measures—illustrating how large logistics firms implement end‑to‑end data governance to boost efficiency, compliance, and business value.

Big DataCost ManagementData Governance
0 likes · 13 min read
How Leading Logistics Companies Master Data Governance for Cost and Stability
DataFunTalk
DataFunTalk
Jan 13, 2023 · Big Data

Data Governance Strategies and Practices: Insights from Leading Companies

The article explains the importance of data governance for organizations handling big data, distinguishes narrow and broad governance approaches, outlines strategic principles, and presents case studies from companies like Tencent, SF Tech, Huolala, and NetEase to illustrate effective governance practices.

Data QualityEnterprise Datacase study
0 likes · 8 min read
Data Governance Strategies and Practices: Insights from Leading Companies
Data Thinking Notes
Data Thinking Notes
Jan 10, 2023 · Big Data

How Bilibili Built a Scalable Data Quality Platform for Billions of Events

This article describes Bilibili’s data quality platform, outlining its background, objectives, theoretical models, workflow stages (recording, checking, alerting), DSL for metrics, root‑cause analysis, scheduling strategies, heterogeneous source integration, rule coverage, intelligent monitoring, and future plans to achieve automated, real‑time, high‑reliability data assurance for massive daily workloads.

Big DataData QualityRoot Cause Analysis
0 likes · 21 min read
How Bilibili Built a Scalable Data Quality Platform for Billions of Events
DataFunTalk
DataFunTalk
Dec 21, 2022 · Fundamentals

The Closed‑Loop Logic of Data Governance at Kuaikan Manhua

Kuaikan Manhua ensures continuous data governance by establishing a closed‑loop of business scope management, data asset standards, and feedback mechanisms that keep data pollution slower than governance speed, enabling systematic, long‑term data quality improvement.

Closed‑LoopData GovernanceData Quality
0 likes · 6 min read
The Closed‑Loop Logic of Data Governance at Kuaikan Manhua
Data Thinking Notes
Data Thinking Notes
Dec 19, 2022 · Big Data

Data Quality Mastery: From Expectations to Operational Assurance

This article outlines a comprehensive data quality management framework, covering expectations, measurement, assurance, and operational practices, and provides concrete templates, rule designs, and governance processes to help data teams systematically assess, monitor, and improve data reliability throughout the lifecycle.

Big DataData GovernanceData Quality
0 likes · 18 min read
Data Quality Mastery: From Expectations to Operational Assurance
Bilibili Tech
Bilibili Tech
Dec 2, 2022 · Big Data

Data Quality Management: Expectations, Measurement, Assurance, and Operation

The article outlines a complete data‑quality‑management framework that first captures business expectations, then translates them into basic and personalized measurement rules, defines four assurance approaches for handling violations, and scales operation with indicators, tooling, and metrics to continuously improve data quality across the lifecycle.

Data GovernanceData QualityMetrics
0 likes · 19 min read
Data Quality Management: Expectations, Measurement, Assurance, and Operation
Data Thinking Notes
Data Thinking Notes
Nov 28, 2022 · Big Data

Unlocking Data Value: How Metadata Drives Efficient Data Management and Quality

This comprehensive guide explains how metadata connects source data, warehouses, and applications, outlines its technical and business classifications, demonstrates its value for data management, profiling, portals, and ETL development, and details optimization, storage, lifecycle, and quality practices essential for robust big‑data operations.

Big DataData QualityOperations
0 likes · 35 min read
Unlocking Data Value: How Metadata Drives Efficient Data Management and Quality
Data Thinking Notes
Data Thinking Notes
Nov 24, 2022 · Fundamentals

How to Build an Enterprise Data Governance System from Scratch

This article explains what data governance is, why enterprises need it, the key components such as data quality, metadata, master data, asset and security management, and provides a step‑by‑step framework, organizational structure, platform features, evaluation methods and common pitfalls.

Data AssetsData GovernanceData Quality
0 likes · 17 min read
How to Build an Enterprise Data Governance System from Scratch
Efficient Ops
Efficient Ops
Nov 22, 2022 · Operations

Why Data Quality Is the Hidden Cost Killer and How to Master Its Governance

This article explains why data quality is critical for business success, outlines common data quality problems and their root causes, and presents a practical governance framework with monitoring rules, alerts, full‑link monitoring, and a seven‑dimensional evaluation model to continuously improve data reliability.

Data GovernanceData Qualitydata monitoring
0 likes · 12 min read
Why Data Quality Is the Hidden Cost Killer and How to Master Its Governance
ITPUB
ITPUB
Nov 5, 2022 · Big Data

How Bilibili Builds a Scalable, Automated, and Intelligent Data Quality Platform

This article explains how Bilibili’s data quality team designs a process‑driven, automated, and AI‑enhanced platform that monitors billions of records daily, defines quality metrics such as completeness and consistency, integrates heterogeneous data sources, and provides root‑cause analysis and real‑time alerting to ensure trustworthy data for its massive user base.

Data QualityRoot Cause AnalysisScheduling
0 likes · 19 min read
How Bilibili Builds a Scalable, Automated, and Intelligent Data Quality Platform
Bilibili Tech
Bilibili Tech
Nov 1, 2022 · Big Data

Design and Implementation of a Data Quality Platform for Large-Scale Data Processing

Bilibili built a scalable data‑quality platform that records metrics from heterogeneous sources, checks them with a rich DSL, alerts once with root‑cause analysis, and uses event‑driven and time‑window scheduling, automated workflows, and intelligent monitoring to ensure real‑time, accurate, trustworthy data for petabyte‑scale processing.

Data QualityRoot Cause Analysisautomation
0 likes · 20 min read
Design and Implementation of a Data Quality Platform for Large-Scale Data Processing
Kuaishou Big Data
Kuaishou Big Data
Oct 25, 2022 · Big Data

How Kuaishou Built a Scalable Big Data Platform with Unified Data Quality and Metric Services

This article details Kuaishou's end‑to‑end big data platform, describing its organizational model, unified data governance framework, comprehensive data‑quality solution, the design of a headless metric platform, key technologies such as automatic modeling and code generation, and future directions toward a decentralized, smart data fabric.

Big DataData GovernanceData Quality
0 likes · 21 min read
How Kuaishou Built a Scalable Big Data Platform with Unified Data Quality and Metric Services

Solving Real‑World Data Quality Challenges with X‑Select’s DQC Platform

This article explains how X‑Select’s Data Quality Platform (DQC) addresses common data quality problems in large‑scale data development by defining six quality dimensions, leveraging open‑source solutions such as Apache Griffin and Qualitis, and implementing rule definition, execution, alerting, and workflow interruption within a Spark‑based architecture.

Big DataData PlatformData Quality
0 likes · 15 min read
Solving Real‑World Data Quality Challenges with X‑Select’s DQC Platform
Architecture Digest
Architecture Digest
Sep 29, 2022 · Big Data

Tagging System Overview, Construction Methodology, and Quality Assessment

This article explains what objects and tags are, distinguishes physical, network, and electronic tags, outlines the structure of tag taxonomy, describes its applications in DMP, CDP, recommendation and user profiling systems, and presents construction principles and quality evaluation criteria for tag systems.

CDPDMPData Quality
0 likes · 13 min read
Tagging System Overview, Construction Methodology, and Quality Assessment
DataFunSummit
DataFunSummit
Aug 26, 2022 · Big Data

Data Governance Practice and Logical Closed‑Loop at KuaiKan: A Case Study

This article presents KuaiKan's data governance journey, detailing the rapid business expansion challenges, the three‑step planning framework, the logical closed‑loop architecture, practical implementation experiences, cross‑team collaboration techniques, and the evaluation of governance outcomes and future plans.

Data Qualitydata engineering
0 likes · 16 min read
Data Governance Practice and Logical Closed‑Loop at KuaiKan: A Case Study
DataFunSummit
DataFunSummit
Aug 18, 2022 · Artificial Intelligence

Evolution and Technical Practices of Du Xiaoman Risk Control Decision Engine

This article presents a comprehensive overview of Du Xiaoman's risk control system evolution—from early rule‑based engines to AI‑enhanced intelligent decision engines—detailing technical practices such as strategy iteration acceleration, decision latency reduction, parallel workflow design, and future trends in data quality, automated strategy optimization, and real‑time analytics.

Data Qualitydecision enginemachine learning
0 likes · 18 min read
Evolution and Technical Practices of Du Xiaoman Risk Control Decision Engine
DataFunTalk
DataFunTalk
Aug 13, 2022 · Big Data

Data Governance Practices and Logical Closed‑Loop at KuaiKan

The talk outlines KuaiKan's data governance journey, describing the rapid business growth challenges, the three‑step logical closed‑loop framework, practical experiences in business scope management, data asset governance, collaboration techniques, and future outlook, highlighting evaluation metrics and ongoing improvements.

Big DataData GovernanceData Quality
0 likes · 16 min read
Data Governance Practices and Logical Closed‑Loop at KuaiKan
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 6, 2022 · Operations

Why Operations Data Quality Is the Key to Successful Digital Transformation

In the era of big data, poor operations data quality undermines analytics, decision‑making and digital transformation, so organizations must adopt a three‑dimensional governance approach—covering organization, processes and technology—to ensure completeness, consistency, accuracy, uniqueness, relevance and timeliness of their operational data.

AnalyticsData GovernanceData Quality
0 likes · 17 min read
Why Operations Data Quality Is the Key to Successful Digital Transformation
Big Data Technology Architecture
Big Data Technology Architecture
Jul 28, 2022 · Big Data

Reflections on Data Governance Challenges and Approaches

The author shares a candid account of transitioning from a non‑data role to confronting data‑centric bottlenecks, describing the current state of data projects, common pitfalls, and practical thoughts on simplifying data governance within limited resources and budget constraints.

Big DataDAMAData Governance
0 likes · 7 min read
Reflections on Data Governance Challenges and Approaches

Data Indicator Testing Platform and Quality Assurance

The article presents an Indicator Testing Platform that automates metric validation—covering timeliness, completeness, accuracy, and consistency—through model‑level comparison, regression, online monitoring, and TDD‑style testing, dramatically reducing manual effort and enabling rapid detection and correction of data quality issues across thousands of business indicators.

Automated TestingData PlatformData Quality
0 likes · 10 min read
Data Indicator Testing Platform and Quality Assurance
DataFunSummit
DataFunSummit
Jun 23, 2022 · Artificial Intelligence

Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting

The talk explains how to maximize the value of training data by exploring background on model generalization, automatic data augmentation techniques, denoising strategies, active learning for selecting unlabeled samples, and robust data splitting methods, offering practical guidelines for AI practitioners.

AIData Qualityactive learning
0 likes · 16 min read
Unlocking Data Potential: Automatic Data Augmentation, Denoising, Active Learning, and Data Splitting
Bilibili Tech
Bilibili Tech
Jun 10, 2022 · Big Data

Incremental Data Lake Design and Hudi Core Optimizations with Flink

The article describes how combining Apache Flink with Hudi enables an incremental data lake that delivers near‑real‑time analytics by switching to merge‑on‑read, fixing log handling bugs, improving compaction planning, and refactoring table‑service scheduling, while showcasing use cases such as CDC ingestion, data quality control, and real‑time materialized views, and outlines future enhancements like optimistic concurrency and unified schema evolution.

Apache HudiCDCCompaction Optimization
0 likes · 21 min read
Incremental Data Lake Design and Hudi Core Optimizations with Flink
IT Architects Alliance
IT Architects Alliance
Jun 5, 2022 · Big Data

Real-Time Data and User Profiling Practices at Zhihu: Architecture, Challenges, and Solutions

This article presents a comprehensive case study of Zhihu's data empowerment team, detailing the design of a real‑time data platform and user profiling system, the challenges faced in scalability, latency, and data quality, and the practical solutions and architectural choices implemented to drive business value.

Data QualityLambda architecturedata pipeline
0 likes · 22 min read
Real-Time Data and User Profiling Practices at Zhihu: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Jun 2, 2022 · Big Data

Data Governance Practices and Product Strategy at NetEase: Challenges, Solutions, and Future Plans

The article presents NetEase's internal data governance experience, outlining past challenges, current pain points, a comprehensive product strategy covering scope, value quantification, and feature implementation, and shares initial results and future plans to build an automated, end‑to‑end big‑data optimization platform.

Cost OptimizationData GovernanceData Quality
0 likes · 13 min read
Data Governance Practices and Product Strategy at NetEase: Challenges, Solutions, and Future Plans
dbaplus Community
dbaplus Community
May 21, 2022 · Big Data

5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market

The article outlines five major 2022 data trends— the rise of analytics engineers, the intensifying lake‑house competition, the growth of real‑time streaming pipelines and operational analytics, the expanding cloud marketplaces for data tools, and the push toward unified data‑quality terminology—explaining their origins, market impact, and future outlook.

Data QualityLakehouseReal-time Streaming
0 likes · 21 min read
5 Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time Pipelines, Cloud Market
vivo Internet Technology
vivo Internet Technology
Apr 20, 2022 · Big Data

Implementing Field Lineage in Spark SQL: A Technical Deep Dive

The article details how to add field‑lineage tracking to Spark SQL by creating a custom SparkSessionExtension that injects a check‑analysis rule and a parser, which capture INSERT statements, analyze the physical plan, and generate a JSON mapping of source‑to‑target fields for data governance.

Data GovernanceData QualityField Lineage
0 likes · 9 min read
Implementing Field Lineage in Spark SQL: A Technical Deep Dive
dbaplus Community
dbaplus Community
Mar 15, 2022 · Big Data

How to Build a Real‑Time Data Warehouse with Flink SQL: Architecture, Implementation, and Governance

This article explains the challenges of early real‑time data pipelines, introduces a layered real‑time warehouse architecture, provides step‑by‑step Flink SQL code for building a demo warehouse, and covers comprehensive data governance, quality metrics, lifecycle management, and naming conventions for production‑grade big‑data systems.

Data GovernanceData QualityFlink SQL
0 likes · 60 min read
How to Build a Real‑Time Data Warehouse with Flink SQL: Architecture, Implementation, and Governance
21CTO
21CTO
Feb 24, 2022 · Big Data

5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time

In 2022 the modern data stack will be driven by the rise of analytics engineers, intensified competition between lakehouse and warehouse solutions, growing demand for real‑time analytics, the explosive growth of cloud marketplaces, and the emergence of unified data‑quality terminology, all reshaping data infrastructure and operational practices.

Data QualityLakehouseReal-time analytics
0 likes · 17 min read
5 Data Trends for 2022: Analytics Engineers, Lakehouse Wars, Real‑Time
Youzan Coder
Youzan Coder
Jan 26, 2022 · Big Data

How to Build a Robust Data Quality Assurance Strategy for Large-Scale Data Platforms

This article outlines a comprehensive data quality assurance framework for a massive reporting platform, covering the data pipeline architecture, detailed testing methods for timeliness, completeness, and accuracy, as well as application‑level checks, downgrade and backup strategies, and future automation plans.

Data Qualityautomationbig data testing
0 likes · 14 min read
How to Build a Robust Data Quality Assurance Strategy for Large-Scale Data Platforms
DataFunTalk
DataFunTalk
Jan 24, 2022 · Big Data

MobTech Data Governance and Security Practices: Architecture, Implementation, and Financial Industry Use Cases

This article presents MobTech’s comprehensive data governance and security practices, covering the necessity of governance, its benefits, a full‑chain governance framework, specific challenges in the financial sector, the evolution of their integrated architecture, and detailed implementations of security, model, asset, monitoring, and quality management systems.

Data GovernanceData Qualityfinancial technology
0 likes · 21 min read
MobTech Data Governance and Security Practices: Architecture, Implementation, and Financial Industry Use Cases
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 18, 2022 · Big Data

Data Warehouse Data Quality Measurement Standards

The article outlines four key dimensions for evaluating data warehouse data quality—correctness, completeness, timeliness, and consistency—explains common consistency issues such as differing metric values across models, cross‑dimensional aggregations, and real‑time versus batch calculations, and proposes organizational and review mechanisms to mitigate these problems.

Big DataConsistencyData Governance
0 likes · 9 min read
Data Warehouse Data Quality Measurement Standards
DataFunSummit
DataFunSummit
Jan 2, 2022 · Big Data

Data Governance Practices and Product Perspective at Beike Zhaofang

This article shares Beike Zhaofang’s two‑year experience building a data governance center, covering the purpose and scope of governance, how the company tailors the focus to its business and system characteristics, the middle‑platform construction approach, project goal management, product and operation rollout, and the challenges and solutions encountered.

Data PlatformData Qualitydata sharing
0 likes · 14 min read
Data Governance Practices and Product Perspective at Beike Zhaofang
dbaplus Community
dbaplus Community
Dec 22, 2021 · Fundamentals

How Xiaomi Built a Scalable Metadata Platform for Data Governance

This article details Xiaomi's end‑to‑end metadata platform, covering its three‑layer architecture, the evolution of full‑domain metadata, real‑time lineage, precise measurement, and how these capabilities enable data map, governance, cost control, and quality improvements for future business empowerment.

Data GovernanceData QualityXiaomi
0 likes · 20 min read
How Xiaomi Built a Scalable Metadata Platform for Data Governance
Architects Research Society
Architects Research Society
Dec 20, 2021 · Fundamentals

Common Misconceptions About Master Data Management (MDM)

The article explains common misconceptions about Master Data Management, emphasizing its enterprise-wide scope, the importance of data quality, governance, workflow, real‑time integration, and the need for organizational change management, while warning against treating MDM as a simple project.

Data GovernanceData QualityMDM
0 likes · 8 min read
Common Misconceptions About Master Data Management (MDM)
Youzan Coder
Youzan Coder
Dec 8, 2021 · Big Data

How to Build a Real‑Time Data Quality Monitoring System with Flink

This article outlines a comprehensive approach to monitoring and ensuring the accuracy and timeliness of real‑time data streams, detailing background challenges, solution design, implementation steps using Flink and automated testing, alert handling procedures, and future improvement plans.

AlertingData QualityFlink
0 likes · 10 min read
How to Build a Real‑Time Data Quality Monitoring System with Flink
DataFunTalk
DataFunTalk
Nov 27, 2021 · Big Data

iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans

The article details iQIYI’s data middle platform architecture and its comprehensive data governance practices, covering platform overview, data flow, unified standards, metadata management, production quality assurance, and future AI‑driven enhancements, illustrating how centralized data services improve reliability, efficiency, and security.

Big DataData GovernanceData Quality
0 likes · 27 min read
iQIYI Data Middle Platform: Architecture, Data Governance Practices, and Future Plans
High Availability Architecture
High Availability Architecture
Oct 25, 2021 · Big Data

iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application

The article details iQIYI's comprehensive data governance initiative for event tracking (Pingback), covering definitions, timing, quality requirements, governance challenges, standardized specifications, coordinate management, testing and gray‑release processes, upgrade workflows, and data security measures that together reduced event volume by 40% and cut resource consumption in half.

AnalyticsBig DataData Governance
0 likes · 16 min read
iQIYI Data Governance Practices: Event Tracking (Pingback) Governance and Application
iQIYI Technical Product Team
iQIYI Technical Product Team
Oct 15, 2021 · Industry Insights

How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance

This article details iQIYI's comprehensive data‑governance practice for event tracking, covering the definition of pingback, the need for governance, the governance framework, coordinate management, gray‑data handling, and the upgrade process that reduced tracking volume by 40% while cutting resource consumption in half.

AnalyticsBig DataData Governance
0 likes · 17 min read
How iQIYI Streamlined Event Tracking: A Deep Dive into Data Governance
iQIYI Technical Product Team
iQIYI Technical Product Team
Oct 9, 2021 · Big Data

iQIYI Data Quality Monitoring: Exploration and Practice

At iTech Salon, iQIYI’s Peng Tao outlined a three‑layer data‑quality monitoring framework—pingback, middle, and business report layers—detailing anomaly‑detection techniques such as thresholds, statistical, correlation and Prophet forecasting, and announced future plans for intelligent rule generation and automated attribution to pinpoint root causes.

Data GovernanceData Qualityrule engine
0 likes · 11 min read
iQIYI Data Quality Monitoring: Exploration and Practice
Airbnb Technology Team
Airbnb Technology Team
Sep 27, 2021 · Big Data

Midas Certification: Airbnb’s End-to-End Data Quality Framework

Airbnb’s Midas certification establishes a company‑wide, multi‑dimensional golden‑standard for data quality—covering accuracy, consistency, timeliness, cost, and completeness—by requiring collaborative design, automated health checks, and four review stages, ensuring certified data is reliable, well‑documented, and ready for reporting, experimentation, and machine‑learning.

AirbnbBig DataData Quality
0 likes · 12 min read
Midas Certification: Airbnb’s End-to-End Data Quality Framework
Architects' Tech Alliance
Architects' Tech Alliance
Sep 11, 2021 · Big Data

Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices

This article explains what a data warehouse is, contrasts it with traditional databases, outlines how to design and build a warehouse—including model selection, subject‑area definition, bus matrix, layering, and data quality—while also covering related concepts such as data middle platforms, data lakes, metadata, and modeling techniques.

Big DataData QualityETL
0 likes · 16 min read
Understanding Data Warehouses: Definitions, Differences, Architecture, Modeling, and Best Practices
DataFunTalk
DataFunTalk
Aug 30, 2021 · Fundamentals

20 Practical Strategies for Effective Data Governance

Effective data governance hinges on leadership commitment, clear policies, skilled teams, and integration into business processes, and this article outlines twenty actionable strategies—from securing executive support and embedding rules in systems to fostering data quality, visualization, and sustainable operations—to guide organizations toward successful governance.

Data GovernanceData QualityLeadership
0 likes · 8 min read
20 Practical Strategies for Effective Data Governance
HelloTech
HelloTech
Aug 27, 2021 · Artificial Intelligence

Algorithm Testing Practices and Machine Learning Foundations at Hello

The Hello algorithm testing team outlines its workflow—from data collection and cleaning through model training, evaluation, and deployment—while teaching machine‑learning fundamentals, detailing company‑wide use cases, defining key terms, and describing four testing capability dimensions covering data quality, service reliability, model performance, and system engineering.

AIData QualityModel Evaluation
0 likes · 12 min read
Algorithm Testing Practices and Machine Learning Foundations at Hello
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData Qualitymonitoring
0 likes · 19 min read
How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform
Airbnb Technology Team
Airbnb Technology Team
Jul 29, 2021 · Big Data

Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices

Airbnb’s 2019 Data Quality Improvement Plan reorganized its data‑engineering workforce, introduced a dedicated data‑engineer role, adopted a decentralized Minerva‑based architecture with Spark pipelines, instituted rigorous testing, governance, and certification processes, and established SLAs and monitoring to ensure timely, trustworthy, well‑documented data across the enterprise.

AirbnbBig DataData Architecture
0 likes · 13 min read
Airbnb’s Data Quality Improvement Plan: Organizational, Architectural, and Governance Practices
Didi Tech
Didi Tech
Jul 1, 2021 · Big Data

Full-Chain Traffic Data Detection in DiDi's Omega Platform

DiDi’s Omega platform provides an end‑to‑end traffic‑data pipeline—from SDK collection through real‑time and offline ETL to storage and analysis—augmented by a detection service that measures loss, duplication and accuracy, achieving sub‑1% SDK loss, integrity tagging, comprehensive monitoring dashboards, and includes a senior data‑engineer hiring call.

Data QualityOmega Platformdata pipeline
0 likes · 9 min read
Full-Chain Traffic Data Detection in DiDi's Omega Platform
Architect
Architect
Jul 1, 2021 · Big Data

Data Governance Practices at Meituan Hotel Travel Platform

This article presents a comprehensive case study of Meituan's hotel‑travel data governance, covering the background, challenges, strategic goals, standardized processes, technical systems, cost and security optimizations, measurable outcomes, and future plans for automated governance.

Big DataCost OptimizationData Governance
0 likes · 29 min read
Data Governance Practices at Meituan Hotel Travel Platform
Youzan Coder
Youzan Coder
Jun 30, 2021 · Big Data

Online Monitoring Practices for Offline and Real-Time Data at Youzan

Youzan Data Report Center monitors offline batch and real‑time data pipelines using accuracy and timeliness rules, cross‑table checks, upstream‑downstream comparisons, and scheduled alerts to detect anomalies early; since 2021 it has generated over 25 alerts, and plans a unified data‑quality dashboard.

Big DataData QualityFlink
0 likes · 12 min read
Online Monitoring Practices for Offline and Real-Time Data at Youzan