Tagged articles
255 articles
Page 1 of 3
DataFunSummit
DataFunSummit
Apr 13, 2026 · Industry Insights

How Kuaishou’s Life Services Data Center Boosted Warehouse Efficiency with AI Agents

In a rapidly growing data‑driven environment, Kuaishou’s Life Services Data Center tackled exploding demand and limited manpower by replacing traditional siloed data‑warehouse practices with AI‑driven intelligent review, DQC, and chatbot solutions, achieving up to 11.34% productivity gains and dramatically improving data quality.

AIData QualityKnowledge Base
0 likes · 16 min read
How Kuaishou’s Life Services Data Center Boosted Warehouse Efficiency with AI Agents
Big Data Tech Team
Big Data Tech Team
Apr 13, 2026 · Industry Insights

How AI Large Models Can Revolutionize Data Warehouses: 3 Use Cases & 5 Pitfalls

This article examines how AI large models can transform data warehouse development by automating modeling, improving data cleansing and quality auditing, and enabling intelligent operations, while also highlighting five common implementation pitfalls and practical best‑practice recommendations for enterprises seeking cost, efficiency, and quality gains.

AIData QualityOperations
0 likes · 10 min read
How AI Large Models Can Revolutionize Data Warehouses: 3 Use Cases & 5 Pitfalls
Huolala Tech
Huolala Tech
Apr 8, 2026 · Operations

How Real-Time Binlog Monitoring and AI Transform Data Quality Alerting

This article explains the design of a zero‑code, real‑time data quality alert platform that leverages Binlog‑based ingestion, configurable metrics, automated attribution, and LLM‑driven decision making to provide fine‑grained monitoring, rapid response, and measurable operational benefits across marketing workflows.

AI decisionBinlogData Quality
0 likes · 12 min read
How Real-Time Binlog Monitoring and AI Transform Data Quality Alerting
AI Info Trend
AI Info Trend
Mar 19, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

In February 2026, China introduced a pioneering group standard that defines executable acceptance rules for AI training datasets, linking data delivery, quality assessment, and model training through a three‑layer framework, quantitative metrics, and a pre‑negotiated quality baseline to reduce disputes and costs.

AIData AcceptanceData Quality
0 likes · 7 min read
How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance
Yunqi AI+
Yunqi AI+
Mar 14, 2026 · Industry Insights

Why Building an AI Knowledge Base Becomes an All‑Hands Initiative Once AI Goes Deep

The article explains how scaling AI agents reveals fragmented, inconsistent internal documentation, and argues that high‑quality production knowledge bases require a company‑wide, role‑based process, concrete writing rules, continuous inspection, and cross‑department ownership to ensure AI answers remain accurate and user‑focused.

AI deploymentAI knowledge baseData Quality
0 likes · 14 min read
Why Building an AI Knowledge Base Becomes an All‑Hands Initiative Once AI Goes Deep
AIWalker
AIWalker
Mar 8, 2026 · Artificial Intelligence

How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning

VisionPangu demonstrates that a compact 1.7 B‑parameter multimodal model can generate richly detailed, coherent image descriptions that rival much larger models by leveraging high‑quality dense data, a three‑part architecture, and a two‑stage deep alignment training strategy.

AI researchData QualityImage Captioning
0 likes · 13 min read
How VisionPangu’s 1.7B Model Beats Larger LLMs in Detailed Image Captioning
Fun with Large Models
Fun with Large Models
Mar 8, 2026 · Artificial Intelligence

EasyDataset: End-to-End Guide for Generating QA Datasets for LLM Fine‑Tuning

This article walks through the complete workflow of using EasyDataset to create high‑quality question‑answer pairs for supervised fine‑tuning, covering question generation (single and batch), three generation algorithms, answer generation (including chain‑of‑thought and multi‑turn dialogue), a hybrid quality‑assessment pipeline, and export to Alpaca or ShareGPT formats.

Alpaca formatData QualityEasyDataset
0 likes · 18 min read
EasyDataset: End-to-End Guide for Generating QA Datasets for LLM Fine‑Tuning
Alibaba Cloud Developer
Alibaba Cloud Developer
Mar 6, 2026 · Big Data

How DataWorks Turns Data Quality Rules into Code with Data Contracts

This article explains how DataWorks integrates data quality specifications directly into the SQL development workflow using Data Contracts, addressing governance lag, versioning gaps, and trust issues while providing a unified, version‑controlled, and automated quality assurance process for offline data pipelines.

Data QualityDataWorksYAML
0 likes · 12 min read
How DataWorks Turns Data Quality Rules into Code with Data Contracts
Wuming AI
Wuming AI
Mar 2, 2026 · Industry Insights

How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance

The article explains how the newly released "AI Training Data Set Delivery and Quality Acceptance Specification" addresses gaps in existing data‑quality standards by defining a three‑layer acceptance framework, quantitative metrics, and a pre‑negotiated quality‑baseline mechanism to make dataset delivery verifiable and directly supportive of model training goals.

AI data standardsData GovernanceData Quality
0 likes · 7 min read
How China’s New AI Training Data Standard Bridges Data Delivery and Model Performance
PaperAgent
PaperAgent
Jan 6, 2026 · Artificial Intelligence

How Ontology‑Driven GraphRAG Eliminates Noise in AI Knowledge Graphs

This article examines the shortcomings of naïve GraphRAG implementations on clinical data and explains how an ontology‑driven, zero‑noise GraphRAG architecture can create self‑improving, conflict‑free knowledge graphs for AI applications.

AIData QualityGraphRAG
0 likes · 3 min read
How Ontology‑Driven GraphRAG Eliminates Noise in AI Knowledge Graphs
Fighter's World
Fighter's World
Dec 19, 2025 · Industry Insights

How Surge AI Works: Decoding the Data Alchemy Behind Modern AI

The article analyzes Surge AI’s $1.2 billion revenue, bootstrapped model, elite 100 k‑labeler network, three‑layer architecture, RLHF, AdvancedIF/RIFL benchmarks, red‑team testing, RL environments, and evaluates its competitive moat and future strategic paths.

AI AlignmentData QualityIndustry Analysis
0 likes · 21 min read
How Surge AI Works: Decoding the Data Alchemy Behind Modern AI
dbaplus Community
dbaplus Community
Dec 7, 2025 · Artificial Intelligence

How AI Agents Can Revolutionize Data Governance: A Step‑by‑Step Blueprint

This article explains how AI agents transform traditional data governance by introducing a four‑layer perception‑decision‑execution‑learning architecture, detailing the required technologies, tool integrations, code examples, deployment steps, team roles, security safeguards, and practical rollout strategies for enterprises seeking automated, intelligent data management.

AI AgentData GovernanceData Quality
0 likes · 10 min read
How AI Agents Can Revolutionize Data Governance: A Step‑by‑Step Blueprint
Sohu Tech Products
Sohu Tech Products
Nov 26, 2025 · Artificial Intelligence

How Cleanlab Cut Data Review by 34×: A Real‑World Text Classification Case Study

This article walks through a real text‑classification project where noisy labels inflated the review workload to over 15,000 samples, and shows how using cleanlab’s confident‑learning framework reduced the manual audit set to 438 items, boosting efficiency by thirty‑four times while improving model performance.

Data QualityData‑Centric AIcleanlab
0 likes · 16 min read
How Cleanlab Cut Data Review by 34×: A Real‑World Text Classification Case Study
Wu Shixiong's Large Model Academy
Wu Shixiong's Large Model Academy
Nov 20, 2025 · Artificial Intelligence

How to Build a Quantifiable Data Quality Framework for Dynamic Incremental RAG

This article explains why static RAG metrics don’t apply to dynamic pipelines, introduces five essential dimensions—Parseability, Deduplication, Relevance, Chunk Quality, and Freshness—and shows how to combine them into a weighted score that enables monitoring, alerts, and continuous improvement of dynamic RAG systems.

Data QualityDynamic RAGMetrics
0 likes · 10 min read
How to Build a Quantifiable Data Quality Framework for Dynamic Incremental RAG
Data Thinking Notes
Data Thinking Notes
Nov 2, 2025 · Artificial Intelligence

Why Data Governance Is the Key to Trustworthy AI in the Large Model Era

The article explains how the rapid rise of large‑model AI has shifted the focus from models to data, outlines the concept and stages of AI‑specific data governance, identifies challenges such as low‑quality data, privacy leaks, bias, and proposes a comprehensive framework of principles, processes, and technologies to ensure high‑quality, secure, and ethical AI deployment.

AIData GovernanceData Quality
0 likes · 40 min read
Why Data Governance Is the Key to Trustworthy AI in the Large Model Era
Big Data Tech Team
Big Data Tech Team
Oct 28, 2025 · Big Data

From Data Chaos to Decision Engine: A Step‑by‑Step Guide to Offline Data Warehouse Governance

This article walks you through why unmanaged data warehouses fail, outlines three golden governance principles, details five practical implementation steps—from building a data lineage map to creating business‑driven quality dashboards—and shares real‑world case studies and common pitfalls to help turn your data warehouse into a trusted decision‑making engine.

Business IntelligenceData Qualitydata-warehouse
0 likes · 11 min read
From Data Chaos to Decision Engine: A Step‑by‑Step Guide to Offline Data Warehouse Governance
Data Party THU
Data Party THU
Oct 28, 2025 · Artificial Intelligence

Can Low‑Quality Data Cause Irreversible ‘Brain Rot’ in Large Language Models?

Researchers from Texas A&M and UT Austin demonstrate that prolonged pre‑training on low‑quality, short‑form web content causes large language models to suffer irreversible cognitive decline—manifested as attention loss, broken reasoning chains, and personality distortion—highlighting data quality as a critical training‑time safety issue.

Artificial IntelligenceCognitive SafetyData Quality
0 likes · 7 min read
Can Low‑Quality Data Cause Irreversible ‘Brain Rot’ in Large Language Models?
DataFunTalk
DataFunTalk
Oct 18, 2025 · Big Data

Inside Ant Group’s Big Data Governance: Key Practices and Insights

This article shares Ant Group’s practical experience in large-scale data governance, outlining four main topics—overall governance overview, data quality management, data storage-processing governance, and future considerations—while emphasizing the five critical aspects of architecture, security, compliance, quality, and value that drive effective big-data operations.

Data ArchitectureData GovernanceData Quality
0 likes · 4 min read
Inside Ant Group’s Big Data Governance: Key Practices and Insights
DataFunTalk
DataFunTalk
Oct 6, 2025 · Big Data

What Ant Group Learned: 5 Pillars of Effective Data Governance

Ant Group shares its practical experience in big data governance, outlining five key focus areas—architecture, security, compliance, quality, and value—through four structured sections and detailed discussions on data quality and storage governance, while also exploring future challenges and the economics of data.

Ant GroupBig DataData Architecture
0 likes · 4 min read
What Ant Group Learned: 5 Pillars of Effective Data Governance
AntTech
AntTech
Sep 13, 2025 · Artificial Intelligence

Why High‑Quality Data Is the New Breakthrough for Large‑Scale AI Models

At the 2025 Inclusion·Bund Conference forum, leading scholars and industry experts revealed how high‑quality data and AI form a dual‑engine that reshapes model training, improves performance, and drives the next evolution of intelligent systems.

AI training dataData InfrastructureData Quality
0 likes · 7 min read
Why High‑Quality Data Is the New Breakthrough for Large‑Scale AI Models
AntTech
AntTech
Sep 12, 2025 · Artificial Intelligence

Breaking the AGI Wall: Scaling Laws, Multi‑Agent Collaboration & RL Insights

The Inclusion·外滩大会 forum explored how diminishing returns from massive models demand a shift toward cognitive reasoning, autonomous evolution, multi‑agent coordination, reinforcement learning, high‑quality data, and MoE diffusion models to bridge digital AI with the physical world.

AGIAI applicationsData Quality
0 likes · 7 min read
Breaking the AGI Wall: Scaling Laws, Multi‑Agent Collaboration & RL Insights
Baidu Geek Talk
Baidu Geek Talk
Sep 3, 2025 · Big Data

How Baidu’s TDS Platform Achieves End‑to‑End Data Governance and Smart Operations

This article details Baidu MEG’s TDS (Turing Data Studio) platform, explaining its three‑pillar governance framework—process standardization, quality controllability, and intelligent operations—along with concrete mechanisms, automation, and measurable results that dramatically improve data reliability, operational efficiency, and fault‑tolerance in large‑scale data production.

Data GovernanceData QualityDevOps
0 likes · 20 min read
How Baidu’s TDS Platform Achieves End‑to‑End Data Governance and Smart Operations
Bilibili Tech
Bilibili Tech
Jul 25, 2025 · Big Data

How Unified Metadata Lineage Transforms Big Data Governance and Security

This article introduces the comprehensive design and evolution of a unified metadata lineage platform for big data, covering background, data processing chain, lineage models, system architecture, quality metrics, application scenarios, and future plans to enhance data governance, quality, and security.

ArchitectureBig DataData Governance
0 likes · 27 min read
How Unified Metadata Lineage Transforms Big Data Governance and Security
JD.com Experience Design Center
JD.com Experience Design Center
Jul 3, 2025 · Fundamentals

Why Paid Online Surveys Often Yield Bad Data—and How Professionals Ensure Quality

This article explores the evolution of questionnaire surveys from costly offline methods to modern online panels, reveals how monetary incentives create professional respondents and data fraud, and outlines rigorous methodologies—including diversified sampling, balanced reward design, and multi‑layered quality controls—to obtain high‑quality market research data.

Data Qualitymarket researchonline panels
0 likes · 15 min read
Why Paid Online Surveys Often Yield Bad Data—and How Professionals Ensure Quality
Big Data Tech Team
Big Data Tech Team
Jun 9, 2025 · Industry Insights

How AI Large Models Transform Data Governance: 2025 Insights & Best Practices

This article examines the essence of data governance, outlines its four core domains, proposes a strategic and technical implementation roadmap, evaluates effectiveness with the DCAM model, and explores how AI large models can enhance metadata, data quality, and compliance while highlighting practical limitations and future trends.

AI Large ModelsData QualityFuture Trends
0 likes · 9 min read
How AI Large Models Transform Data Governance: 2025 Insights & Best Practices
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Jun 6, 2025 · Artificial Intelligence

Tackling the Top Challenges of Retrieval‑Augmented Generation (RAG)

The article enumerates common pitfalls of Retrieval‑Augmented Generation—such as missing content, low‑rank document misses, context limits, format errors, incomplete answers, scalability bottlenecks, complex PDF extraction, data‑quality issues, domain adaptation gaps, hallucinations, and feedback‑loop deficiencies—and offers concrete mitigation strategies ranging from data cleaning and prompt design to hybrid search, hierarchical retrieval, document compression, and automated evaluation.

Data QualityHybrid SearchLLM
0 likes · 9 min read
Tackling the Top Challenges of Retrieval‑Augmented Generation (RAG)
Continuous Delivery 2.0
Continuous Delivery 2.0
May 30, 2025 · Artificial Intelligence

Data Quality and Diversity: The Critical Battlefield Beyond AI Models

The article explains why high‑quality, diverse data—rather than just advanced models—has become the decisive factor for enterprise AI success, outlining key dimensions of data quality, strategies for building diverse datasets, and practical steps for establishing a data‑first AI strategy.

AIData GovernanceData Quality
0 likes · 12 min read
Data Quality and Diversity: The Critical Battlefield Beyond AI Models
Big Data Tech Team
Big Data Tech Team
May 20, 2025 · Industry Insights

Mastering 2025 Enterprise Data Quality Governance: Goals, Framework & Roadmap

This guide presents a comprehensive 2025 enterprise data quality governance strategy, covering objectives, common challenges, a three‑dimensional governance model, control mechanisms, organizational structures, phased implementation roadmaps, recommended technical tools, and industry best‑practice case studies.

AIData QualityEnterprise
0 likes · 9 min read
Mastering 2025 Enterprise Data Quality Governance: Goals, Framework & Roadmap
Big Data Tech Team
Big Data Tech Team
May 18, 2025 · Industry Insights

How AI Is Revolutionizing Data Governance: Six Real‑World Scenarios and Solutions

This article examines how artificial‑intelligence techniques such as natural‑language processing, knowledge graphs, federated learning and automated ETL are applied across six core data‑governance scenarios—standardization, asset management, master data, data‑warehouse automation, security/privacy, and real‑time quality monitoring—showing measurable efficiency gains and business impact.

AIData QualityEnterprise Analytics
0 likes · 10 min read
How AI Is Revolutionizing Data Governance: Six Real‑World Scenarios and Solutions
Big Data Tech Team
Big Data Tech Team
Apr 21, 2025 · Industry Insights

8 Practical Ways DeepSeek Boosts Data Quality for Better Governance

This guide outlines eight concrete methods DeepSeek uses to improve data quality—including automated cleaning, validation, classification, monitoring, governance standards, anomaly detection, integration, and intelligent analysis—providing actionable steps for organizations to enhance data accuracy, completeness, consistency, and usability.

Data IntegrationData QualityDeepSeek
0 likes · 5 min read
8 Practical Ways DeepSeek Boosts Data Quality for Better Governance
dbaplus Community
dbaplus Community
Apr 16, 2025 · Backend Development

How Ctrip’s Kafka Gatekeeper Boosts FinOps Data Quality and Automates Cost Governance

This article explains how Ctrip’s hybrid‑cloud FinOps billing system uses a custom Kafka Gatekeeper to detect, locate, and automatically remediate data‑quality issues across dozens of self‑built PaaS services, improving coverage, timeliness, and responsibility attribution while supporting high‑availability deployments.

BackendCloud NativeData Quality
0 likes · 19 min read
How Ctrip’s Kafka Gatekeeper Boosts FinOps Data Quality and Automates Cost Governance
Big Data Tech Team
Big Data Tech Team
Feb 17, 2025 · Industry Insights

How DeepSeek Transforms Data Warehouse Development: 5 Game-Changing Benefits

DeepSeek, the popular Chinese large‑language model, boosts data‑warehouse engineers' productivity by offering free, open‑source AI assistance across code writing, model design, metadata management, data quality monitoring, and governance, ultimately maximizing enterprise data asset value.

Artificial IntelligenceData QualityDeepSeek
0 likes · 5 min read
How DeepSeek Transforms Data Warehouse Development: 5 Game-Changing Benefits
Ctrip Technology
Ctrip Technology
Jan 3, 2025 · Big Data

Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance

This article describes the challenges of data quality in Ctrip’s hybrid‑cloud FinOps billing system and presents the design, implementation, and high‑availability deployment of a custom Kafka Gatekeeper proxy that performs pre‑validation, configurable rules, self‑service dashboards, and automated alerts to improve coverage, timeliness, and responsibility attribution.

Big DataCloud NativeData Quality
0 likes · 17 min read
Design and Implementation of a Kafka Gatekeeper for FinOps Billing Data Quality Governance
DataFunSummit
DataFunSummit
Jan 1, 2025 · Big Data

Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications

This article introduces Douyin Group’s end‑to‑end data asset management platform, explains the evolution and architecture of its large‑scale data lineage system, presents quality metrics and ecosystem components, and outlines practical applications and future directions for data governance, development, and security.

Data Asset PlatformData GovernanceData Lineage
0 likes · 16 min read
Douyin Group Data Asset Management Platform: Full‑Stack Data Lineage Evolution and Applications
Huolala Tech
Huolala Tech
Dec 24, 2024 · Information Security

How Huolala Accelerated Risk‑Control Testing with Automated Tools

This article details Huolala's challenges in risk‑control testing amid rapid business growth, outlines the inefficiencies of manual configuration verification, and explains how a suite of automated tools and a full‑scope interception strategy dramatically improved testing efficiency, data quality assurance, and cross‑team collaboration.

Data QualityTesting Automationprocess optimization
0 likes · 21 min read
How Huolala Accelerated Risk‑Control Testing with Automated Tools
Model Perspective
Model Perspective
Dec 23, 2024 · Fundamentals

Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid

From the excitement of first encountering mathematical modeling to becoming a seasoned practitioner, this guide outlines five progressive stages, reveals typical misconceptions at each level, and offers practical advice to help learners avoid common traps and develop both technical and soft skills.

Data QualityModel Evaluationcommon pitfalls
0 likes · 8 min read
Mastering Mathematical Modeling: 5 Stages & Common Pitfalls to Avoid
Architects' Tech Alliance
Architects' Tech Alliance
Dec 23, 2024 · Artificial Intelligence

Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs

The article explains how breakthroughs in artificial intelligence depend on high‑quality, large‑scale, and diverse training data, outlines the data‑centric AI movement, details a six‑step workflow for building datasets, and surveys the data industry ecosystem supporting large language model development.

AI dataData QualityData‑Centric AI
0 likes · 7 min read
Why High‑Quality, Massive, Diverse Data Fuels AI Breakthroughs
ByteDance Data Platform
ByteDance Data Platform
Nov 6, 2024 · Big Data

How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges

This article explains how Douyin Group tackles massive data volume, quality, and efficiency issues by building a four‑layer intelligent platform, standardizing metric management, automating metric decomposition, and creating reusable metric services that boost agility, stability, and cross‑team collaboration.

Big DataData PlatformData Quality
0 likes · 20 min read
How Douyin’s Data Platform Overcomes EB‑Scale Metric Challenges
Baobao Algorithm Notes
Baobao Algorithm Notes
Oct 30, 2024 · Artificial Intelligence

How to Choose High-Quality Instruction Data for LLM Fine‑Tuning: Methods Compared

This article surveys and categorizes instruction data selection techniques for large language model fine‑tuning, explaining metric‑based, trainable‑LLM, powerful‑LLM, and small‑model approaches, detailing representative papers, their pipelines, and empirical findings on data quality and diversity.

AI researchData QualityInstruction Tuning
0 likes · 15 min read
How to Choose High-Quality Instruction Data for LLM Fine‑Tuning: Methods Compared
DataFunSummit
DataFunSummit
Sep 1, 2024 · Artificial Intelligence

Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges

This article surveys data management for large language model training, covering an overview, pre‑training data composition, scaling‑law‑driven quantity control, quality filtering, deduplication, harmful‑content removal, instruction fine‑tuning strategies, dynamic data selection, and emerging research challenges such as bias mitigation, multimodal data handling, and synthetic‑data filtering.

Data Qualityinstruction fine-tuningpretraining
0 likes · 18 min read
Data Management in Large Language Model Training: Overview, Pre‑training, SFT, and Future Challenges
Data Thinking Notes
Data Thinking Notes
Aug 20, 2024 · Artificial Intelligence

How Large AI Models Transform Data Governance: Strategies and Challenges

This article explores how the rise of massive AI models reshapes data governance, detailing model fundamentals, architectural types, emerging challenges, a five‑domain governance framework, and practical AI‑driven applications for data standards, metadata, quality, and security, while also looking ahead to future trends.

AIData GovernanceData Quality
0 likes · 14 min read
How Large AI Models Transform Data Governance: Strategies and Challenges
DataFunSummit
DataFunSummit
Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Data QualityFlinkreal-time data
0 likes · 15 min read
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook
Data Thinking Notes
Data Thinking Notes
Jun 27, 2024 · Fundamentals

How to Build Effective Data Standards for Enterprise Governance

This article explains the concept of data standards, outlines the three main categories of data standards, describes a four‑stage implementation process, and provides a real‑world bank case study to illustrate how enterprises can establish and apply data standards for better data quality and value.

Data GovernanceData QualityEnterprise Data
0 likes · 11 min read
How to Build Effective Data Standards for Enterprise Governance
DataFunTalk
DataFunTalk
Jun 23, 2024 · Big Data

Building Full-Chain Data Lineage for E‑commerce Scenarios

This article explains how to construct a full‑chain data lineage system for e‑commerce, covering the concepts of data lineage, the design of a lineage foundation, quality measurement, application‑level lineage, and practical use cases such as table migration, field‑level tracing, and automated metric decomposition.

Data LineageData Qualitye‑commerce
0 likes · 12 min read
Building Full-Chain Data Lineage for E‑commerce Scenarios
Baidu Tech Salon
Baidu Tech Salon
Jun 12, 2024 · Big Data

Event Tracking Governance: Concepts, Challenges, and Platform Solutions

Event‑tracking governance ensures accurate, consistent user‑behavior data by managing the full lifecycle of logging points through defined quality standards, a digitized workflow, and supporting tools such as rule editors, real‑time testing, and compliance monitoring, while the platform’s page‑scene tree model and metrics improve visibility, reduce duplication, and drive business insight.

AnalyticsData QualityTooling
0 likes · 13 min read
Event Tracking Governance: Concepts, Challenges, and Platform Solutions
Baidu Geek Talk
Baidu Geek Talk
Jun 12, 2024 · Big Data

Event Tracking Governance and Logging Platform Solutions

The article explains event tracking, its data‑quality challenges, and presents a logging platform that enforces quality standards, an end‑to‑end online workflow, and specialized design, testing, and validation tools—including extended field types—to govern, monitor, and improve tracking point compliance across applications.

Data QualityMetricsevent tracking
0 likes · 13 min read
Event Tracking Governance and Logging Platform Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 4, 2024 · Big Data

Ant Group's Data Governance Practices: Quality, Storage, and Future Directions

This article presents Ant Group's comprehensive data governance experience, covering data quality management, storage governance, architectural design, operational strategies, case studies, and forward‑looking thoughts on integrated lake‑warehouse governance, data value realization, and AI‑driven automation.

Ant GroupBig DataData Quality
0 likes · 19 min read
Ant Group's Data Governance Practices: Quality, Storage, and Future Directions
DataFunTalk
DataFunTalk
May 25, 2024 · Big Data

Data Quality Governance: From Compliance to Reasonableness and the Quality Review Tool System

This article explains how to assess and improve data quality by moving from simple compliance checks to deeper reasonableness analysis, using visual dashboards, a comprehensive quality‑review tool suite, intelligent judgement rules, self‑diagnosis utilities, and key technical components such as sample libraries and a three‑layer architecture.

Data Qualityintelligent detectionvisualization
0 likes · 25 min read
Data Quality Governance: From Compliance to Reasonableness and the Quality Review Tool System
Data Thinking Notes
Data Thinking Notes
May 21, 2024 · Fundamentals

Master Data Management: Building a Unified, High‑Quality Data Backbone for Enterprises

Enterprises facing fragmented systems like OA, HR, CRM, and ERP must adopt a unified master data management framework that defines standards, governance, organizational structures, and integrated platforms to ensure data consistency, accuracy, and real‑time availability, thereby reducing maintenance costs and supporting digital transformation.

Data GovernanceData QualityDigital Transformation
0 likes · 16 min read
Master Data Management: Building a Unified, High‑Quality Data Backbone for Enterprises
DataFunSummit
DataFunSummit
May 21, 2024 · Operations

Bilibili Data Governance Operational Framework Practice

This article presents Bilibili's practical data governance operational framework, introducing the DAMA‑Bok methodology, detailing two real‑world cases on storage‑level risk and data‑loss post‑mortem, and outlining the organizational, metadata, and embedded governance mechanisms that drive cost and quality improvements.

DAMA-BokData Qualitycost governance
0 likes · 19 min read
Bilibili Data Governance Operational Framework Practice
Data Thinking Notes
Data Thinking Notes
Apr 25, 2024 · Fundamentals

Mastering Data Governance: Build High‑Quality, Secure, Traceable Business Data

This article explains how a comprehensive data governance framework—covering data quality, metadata, master data, asset management, security, and standards—can ensure high‑quality, safe, and traceable business data while outlining implementation steps, organizational roles, platform features, and assessment methods.

Data GovernanceData QualityMaster Data
0 likes · 12 min read
Mastering Data Governance: Build High‑Quality, Secure, Traceable Business Data
dbaplus Community
dbaplus Community
Apr 4, 2024 · Artificial Intelligence

10 Guiding Principles for Building LLM‑Powered Software Applications

This article outlines ten practical principles for designing applications with large language models, emphasizing a model‑first mindset, precision through interactive disambiguation, clear division of code and model responsibilities, data quality, handling uncertainty, and recognizing the limits of LLMs to build robust, maintainable software.

AI designData QualityLLM
0 likes · 13 min read
10 Guiding Principles for Building LLM‑Powered Software Applications
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 21, 2024 · Artificial Intelligence

Can the CaR Method Achieve Better LLM Performance with Only 1.4% of Training Data?

This article explains how the CaR (Clustering and Ranking) approach evaluates data quality with a scoring model and selects diverse samples via PCA‑reduced sentence embeddings and K‑Means clustering, achieving comparable or superior large‑model performance while using just 1.96% of the original dataset.

CaR methodData QualityLLM training
0 likes · 8 min read
Can the CaR Method Achieve Better LLM Performance with Only 1.4% of Training Data?
DataFunSummit
DataFunSummit
Mar 18, 2024 · Big Data

Scenario‑Based Data Governance Practices in the Securities Industry

This article presents a comprehensive, scenario-driven data governance practice at Guoxin Securities, covering the industry's pain points, a three‑layer governance framework, detailed implementations for data standards, metadata, data quality, data modeling, and data security, and outlines future directions for intelligent and measurable governance.

Big DataData Qualitydata security
0 likes · 30 min read
Scenario‑Based Data Governance Practices in the Securities Industry
Bitu Technology
Bitu Technology
Mar 15, 2024 · Artificial Intelligence

Monitoring Quality Issues in Tubi’s Recommendation System

This article explains how Tubi monitors the quality of its recommendation system by identifying potential failure points, tracking key data streams such as model input, final recommendation output, and training data, and designing a scalable, real‑time monitoring solution with clear protocols and extensible metrics.

Data QualityReal-TimeScalability
0 likes · 11 min read
Monitoring Quality Issues in Tubi’s Recommendation System
Aikesheng Open Source Community
Aikesheng Open Source Community
Mar 14, 2024 · Databases

SQLE vs Yearning: Detailed Feature, Architecture, and Use‑Case Comparison

This article provides an in‑depth comparison of the open‑source SQL quality management platforms SQLE and Yearning, covering their architecture, supported data sources, UI design, SQL workbench capabilities, user management, ticket workflow, system settings, and overall suitability for different database environments.

Data QualityDatabase ManagementOpen-source
0 likes · 10 min read
SQLE vs Yearning: Detailed Feature, Architecture, and Use‑Case Comparison
DataFunTalk
DataFunTalk
Mar 1, 2024 · Fundamentals

Data Quality Governance: Overview, Challenges, and Practices

This presentation by Zhou Jie, a senior data R&D expert at Ant Financial, outlines the scope of data governance, examines the challenges of ensuring high‑quality financial data, and shares practical architectures, solutions, and case studies to help attendees understand data quality risks and mitigation strategies.

Data Qualityfinancial data
0 likes · 2 min read
Data Quality Governance: Overview, Challenges, and Practices
JavaEdge
JavaEdge
Feb 20, 2024 · Big Data

Designing a Scalable Data Quality Center for Offline Big‑Data Pipelines

This article describes the design and implementation of a platform‑wide Data Quality Center for offline big‑data pipelines, covering research of existing solutions, design goals, system architecture based on DolphinScheduler, rule definition language, binding and execution mechanisms, and future enhancements such as lineage monitoring and real‑time checks.

Apache GriffinBig DataData Quality
0 likes · 18 min read
Designing a Scalable Data Quality Center for Offline Big‑Data Pipelines
DataFunSummit
DataFunSummit
Feb 19, 2024 · Big Data

Yipay Data Warehouse Construction and Data Governance Practices

This presentation by senior data warehouse engineer Huang Luo details Yipay's end‑to‑end data warehouse build, covering background challenges, governance framework, platform development, layered architecture, naming standards, monitoring, and future plans, offering practical insights for data engineers, architects, and business stakeholders.

Big DataData ArchitectureData Quality
0 likes · 14 min read
Yipay Data Warehouse Construction and Data Governance Practices
DataFunSummit
DataFunSummit
Feb 3, 2024 · Artificial Intelligence

Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment

This article details how MaShang Consumer Finance leverages large language models for sales, collection, and customer service, covering company background, AI research achievements, model training infrastructure, data‑quality and compliance challenges, prompt engineering, inference acceleration, evaluation methods, and lessons learned from real‑world deployment.

Data QualityLLMModel Deployment
0 likes · 21 min read
Practical Application of Large Language Models in MaShang Consumer Finance: From Model Building to Deployment
Data Thinking Notes
Data Thinking Notes
Jan 2, 2024 · Big Data

How a Three-Dimensional Data Governance Model Breaks Silos and Boosts Efficiency

Enterprise data governance faces challenges like information silos, departmental walls, and unclear responsibilities; adopting a three‑dimensional “business‑technology‑organization” framework—setting standards, optimizing processes, and innovating structures—helps eliminate these obstacles, enhance collaboration, improve data quality, and drive cost‑saving, efficiency, and innovation.

Big DataData GovernanceData Quality
0 likes · 10 min read
How a Three-Dimensional Data Governance Model Breaks Silos and Boosts Efficiency
Weimob Technology Center
Weimob Technology Center
Jan 2, 2024 · Big Data

How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse

This article details practical methods for testing BI reports built on Hive and StarRocks, covering the report creation workflow, testing characteristics, SQL writing techniques, impact analysis, data warehouse simplification, and the application of data quality tools to ensure accurate and efficient reporting.

BI testingData QualityStarRocks
0 likes · 9 min read
How to Efficiently Test BI Reports in a Hive‑StarRocks Data Warehouse
Data Thinking Notes
Data Thinking Notes
Nov 26, 2023 · Fundamentals

How a Large Enterprise Overcame Master Data Chaos: A Practical Case Study

This article outlines a real‑world enterprise master data project, detailing the definition of master data, the four critical data‑quality challenges faced, the comprehensive solution framework with executive backing, and the six measurable outcomes that improved data governance, efficiency, and decision‑making across the organization.

Data GovernanceData QualityMaster Data
0 likes · 10 min read
How a Large Enterprise Overcame Master Data Chaos: A Practical Case Study
DataFunSummit
DataFunSummit
Nov 23, 2023 · Information Security

How DCMM Supports Digital Transformation and Data Governance at XCMG Mining Machinery Co., Ltd.

This article details how XCMG Mining Machinery leveraged the DCMM framework to drive digital transformation, improve data governance, address data quality and security challenges, and establish a sustainable data-driven culture across the organization, highlighting the background, implementation steps, lessons learned, and future outlook.

DCMMData QualityDigital Transformation
0 likes · 25 min read
How DCMM Supports Digital Transformation and Data Governance at XCMG Mining Machinery Co., Ltd.
DataFunSummit
DataFunSummit
Nov 22, 2023 · Big Data

Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study

This article presents Bilibili's data quality assurance system, detailing its evolution across four stages, the architectural framework, core capabilities such as a quality data warehouse, monitoring, collaborative safeguards, digital-driven optimization, and efficient incident handling, along with practical case studies and future outlooks.

Big DataData Qualitydata-warehouse
0 likes · 22 min read
Bilibili Data Quality Assurance System: Architecture, Practices, and Case Study
Data Thinking Notes
Data Thinking Notes
Nov 19, 2023 · Fundamentals

How to Build an Effective Data Asset Management Framework for Enterprises

This article explains why enterprises need a data asset framework, outlines its key components such as catalog management, policy support, and development trends, and provides a step‑by‑step guide with visual diagrams for constructing and operating a comprehensive data asset management system.

Data CatalogData GovernanceData Quality
0 likes · 5 min read
How to Build an Effective Data Asset Management Framework for Enterprises
DataFunTalk
DataFunTalk
Nov 19, 2023 · Big Data

Design and Evolution of Zhihu's Event‑Tracking (埋点) System

This article presents a comprehensive overview of Zhihu's event‑tracking system, covering its evolution from early Hadoop‑based pipelines to cloud‑native architectures, detailing toolsets for requirement management, validation, data collection, querying, and service design, and concluding with a practical Q&A on best practices and optimization.

Data Qualitydata pipelineevent tracking
0 likes · 12 min read
Design and Evolution of Zhihu's Event‑Tracking (埋点) System
DaTaobao Tech
DaTaobao Tech
Nov 15, 2023 · Industry Insights

Inside the E‑Commerce Product Domain: Roles, Challenges, and Cutting‑Edge Solutions

This article systematically outlines the e‑commerce product team's responsibilities, the users and consumer pain points it addresses, the core technical challenges such as high‑concurrency reads/writes and AI‑driven automation, and the innovative solutions the team has implemented to keep the product domain healthy, efficient, and intelligent.

AIData QualityIndustry Insights
0 likes · 19 min read
Inside the E‑Commerce Product Domain: Roles, Challenges, and Cutting‑Edge Solutions
HomeTech
HomeTech
Nov 15, 2023 · Industry Insights

How to Build Accurate Data Asset Lineage for Data Warehouse Governance

This article explains the challenges of data asset lineage in large data warehouses, presents a comprehensive approach using business‑level instrumentation, SQL interceptor plugins, and ETL script parsing to generate fine‑grained lineage graphs, and demonstrates measurable improvements in coverage and zombie‑table cleanup.

Data GovernanceData LineageData Quality
0 likes · 18 min read
How to Build Accurate Data Asset Lineage for Data Warehouse Governance
Data Thinking Notes
Data Thinking Notes
Nov 14, 2023 · Big Data

How Financial Institutions Master Data Governance for Digital Transformation

This article examines why data governance has become a critical pillar for Chinese financial institutions, outlining external regulations and internal business drivers, describing a comprehensive governance architecture, and presenting a detailed case study of a securities company's data‑asset inventory, platform implementation, and quality management.

Big DataData GovernanceData Quality
0 likes · 16 min read
How Financial Institutions Master Data Governance for Digital Transformation
DataFunSummit
DataFunSummit
Nov 14, 2023 · Big Data

Integrated Business‑Finance Data Governance and Its Role in Driving Financial Digital Transformation

This article explains the concept, framework, common data problems, three-stage integration model, implementation methods, and the strategic significance of business‑finance data governance for improving data quality, managing data assets, and accelerating enterprise financial digital transformation.

Data QualityEnterprise Data ManagementFinancial Digital Transformation
0 likes · 14 min read
Integrated Business‑Finance Data Governance and Its Role in Driving Financial Digital Transformation
Architects Research Society
Architects Research Society
Nov 13, 2023 · Fundamentals

Why Ongoing Data Maintenance Is Crucial for an Outcome‑Driven Enterprise Data Strategy

The article explains why continuous, proactive data maintenance is essential for an outcome‑driven enterprise data strategy, outlines the risks of poor data quality, and provides practical steps—including business rules, service agreements, KPIs, and ownership—to establish an always‑on data‑maintenance process.

Continuous ImprovementData Qualitydata maintenance
0 likes · 7 min read
Why Ongoing Data Maintenance Is Crucial for an Outcome‑Driven Enterprise Data Strategy
Data Thinking Notes
Data Thinking Notes
Nov 5, 2023 · Fundamentals

Why Poor Data Quality Costs Companies $15M Annually and How to Fix It

Low‑quality data can cost enterprises up to $15 million each year, making data quality management essential for accurate decision‑making, compliance, and operational efficiency, and this article explains its importance, evaluation dimensions, common issues, monitoring metrics, responsible roles, and a three‑phase management framework of prevention, control, and remediation.

Big DataBusiness IntelligenceData Governance
0 likes · 32 min read
Why Poor Data Quality Costs Companies $15M Annually and How to Fix It
Data Thinking Notes
Data Thinking Notes
Nov 2, 2023 · Operations

How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse

This article details Bilibili's data quality assurance framework, covering its evolution across four data platform stages, the architecture of its quality data warehouse, core capabilities such as a complete assurance system, digital‑driven continuous optimization, and efficient incident handling, plus case studies, future plans, and a Q&A session.

Big DataBilibiliData Platform
0 likes · 27 min read
How Bilibili Built a Scalable Data Quality Assurance System for Its Data Warehouse
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 23, 2023 · Big Data

Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook

This article outlines Bilibili's data quality assurance framework, detailing its evolution across four development stages, the current data platform architecture, identified pain points, four key quality objectives, core capabilities such as a quality data warehouse, comprehensive monitoring, digital optimization, fault handling, and future directions.

Big DataData GovernanceData Platform
0 likes · 22 min read
Bilibili Data Quality Assurance: Architecture, Goals, Core Capabilities, and Future Outlook
Architects Research Society
Architects Research Society
Oct 21, 2023 · Fundamentals

Information Governance: Roles, Responsibilities, and Key Processes

The article explains information governance as a business‑driven program that ensures data accuracy, completeness, consistency, accessibility, and security, outlines three essential roles, describes the data administrator’s duties, and details the key procedures and their relationship to corporate and IT governance.

Data ManagementData Qualitydata stewardship
0 likes · 11 min read
Information Governance: Roles, Responsibilities, and Key Processes
Tencent Tech
Tencent Tech
Sep 20, 2023 · Artificial Intelligence

Why Do Large Language Models Hallucinate and How to Reduce It?

The article explains why large language models generate hallucinations—due to data errors, training conflicts, and inference uncertainty—and outlines data‑cleaning, model‑level feedback, knowledge augmentation, constraint techniques, and post‑processing methods such as the “Truth‑seeking” algorithm to mitigate the issue.

AI SafetyData QualityKnowledge Retrieval
0 likes · 8 min read
Why Do Large Language Models Hallucinate and How to Reduce It?
Data Thinking Notes
Data Thinking Notes
Sep 3, 2023 · Big Data

How to Build an Effective Data Governance Framework: Steps & Best Practices

This article outlines a comprehensive data governance framework for Chinese enterprises, covering organizational structures, data asset inventory, six‑stage methodology, and the creation of unified data standards and quality rules to support effective digital transformation and data‑driven decision making.

Big DataData GovernanceData Management
0 likes · 13 min read
How to Build an Effective Data Governance Framework: Steps & Best Practices
Data Thinking Notes
Data Thinking Notes
Aug 30, 2023 · Fundamentals

Mastering Data Governance: A Complete Guide to Metadata, Standards, Quality, and Security

Data governance encompasses a comprehensive framework—including metadata, master data, standards, quality, assets, exchange, security, and lifecycle management—to ensure data’s accuracy, consistency, and value across an organization, offering step‑by‑step guidance, best‑practice models, and visual references for effective implementation.

Data GovernanceData LifecycleData Quality
0 likes · 19 min read
Mastering Data Governance: A Complete Guide to Metadata, Standards, Quality, and Security
DeWu Technology
DeWu Technology
Aug 28, 2023 · Operations

Real-time Data Warehouse Business-Side Chaos Engineering Practice

The article describes how a real‑time data warehouse supporting ad‑delivery metrics adopts both technical and business‑side chaos‑engineering, using red‑blue team drills to inject faults, monitor indicator anomalies, and refine response procedures, thereby enhancing early risk detection, system resilience, and overall data stability for the advertising platform.

Data QualityData WarehousingOps
0 likes · 16 min read
Real-time Data Warehouse Business-Side Chaos Engineering Practice
DataFunSummit
DataFunSummit
Aug 28, 2023 · Big Data

Building Data Production Pipelines with DataOps: Concepts, Practices, and a Six‑Stage Workflow

This article introduces DataOps, outlines its background and the problems it addresses, describes NetEase’s big‑data product ecosystem, and details a six‑stage data production pipeline—including coding, orchestration, testing, code review, release approval, and deployment – plus insights into two pipeline explorations.

Big DataData QualityDataOps
0 likes · 15 min read
Building Data Production Pipelines with DataOps: Concepts, Practices, and a Six‑Stage Workflow
Data Thinking Notes
Data Thinking Notes
Aug 13, 2023 · Big Data

How to Successfully Deliver a Data Governance Project: Step‑by‑Step Guide

This article outlines a comprehensive methodology for delivering a data governance project, covering planning, blueprint design, implementation, and acceptance phases, with detailed guidance on team formation, stakeholder roles, requirement analysis, platform architecture, management processes, and post‑deployment operations.

Big DataData GovernanceData Platform
0 likes · 12 min read
How to Successfully Deliver a Data Governance Project: Step‑by‑Step Guide
dbaplus Community
dbaplus Community
Aug 2, 2023 · Backend Development

How WeChat Built a Scalable Security Data Warehouse for Billions of Requests

This article explains the evolution of WeChat's security data warehouse—from its business background and the need for unified feature storage to the architectural designs, multi‑IDC synchronization, operation system, and data‑quality safeguards that enable reliable, high‑performance security policy development for over a trillion daily feature reads and writes.

Data QualityFeature ManagementReal-time Processing
0 likes · 12 min read
How WeChat Built a Scalable Security Data Warehouse for Billions of Requests
Data Thinking Notes
Data Thinking Notes
Jul 26, 2023 · Big Data

How to Build an Effective Data Asset Catalog for Enterprise Data Governance

This article explains what data assets are, why a data asset catalog is essential for data governance, and provides a step‑by‑step framework—including identification criteria, value dimensions, construction phases, tool support, and core functional modules—to help enterprises systematically create, manage, and leverage a data asset catalog.

Data AssetData CatalogData Governance
0 likes · 16 min read
How to Build an Effective Data Asset Catalog for Enterprise Data Governance