How Kuaishou’s Life Services Data Center Boosted Warehouse Efficiency with AI Agents
In a rapidly growing data‑driven environment, Kuaishou’s Life Services Data Center tackled exploding demand and limited manpower by replacing traditional siloed data‑warehouse practices with AI‑driven intelligent review, DQC, and chatbot solutions, achieving up to 11.34% productivity gains and dramatically improving data quality.
Project Background
Kuaishou Life Services Data Center experienced a seven‑fold increase in data‑warehouse demand since its 2022 launch while the development team remained around ten engineers. The rapid growth created three core problems: siloed development, low delivery efficiency, and weak data‑quality control. Traditional manual processes, extensive review meetings, and periodic governance could not keep up.
AI‑Agent‑Powered Capabilities
The team built three AI‑Agent services to automate the most labor‑intensive parts of the data‑warehouse lifecycle:
Intelligent Review : Large‑model agents automatically enforce naming‑standard rules and perform code‑level validation on DDL and SQL.
Intelligent DQC : A hybrid rule‑based and model‑driven data‑quality checking system that auto‑generates configurations and blocks violations in real time.
Intelligent Customer Service : A retrieval‑augmented generation (RAG) chatbot that answers table/field queries using a curated knowledge base.
Implementation Details
Knowledge‑Base Construction
The knowledge base is built in four stages:
Manual curation of ~1,200 term roots (business nouns, atomic metrics, modifiers) to ensure semantic consistency.
Creation of a data‑white‑paper that documents high‑quality warehouse assets.
Integration of metadata (tables, fields, lineage, usage frequency, asset tier, SQL code, execution plans) and implementation of a SQL‑fragment splitter.
Definition of standard policies (naming, DQ configuration, masking) to protect sensitive data.
Intelligent Review Engine
The engine replaces manual review documents and architect meetings. It extracts DDL and SQL from each commit, then splits the review into two independent pipelines:
Naming‑standard check : validates object names against the term‑root dictionary.
Development check : parses SQL into an abstract syntax tree (AST) using a custom SQL‑lineage parser, then runs static‑analysis rules (e.g., SELECT *, date‑format consistency, INSERT order).
Results are pushed to developers via the internal Kim messaging system, enforcing a “review‑as‑required” policy.
Core Tooling
Two key libraries support the review process: SQLLineageParser: builds ASTs, extracts column‑level lineage, and aligns execution‑plan actions with logical operations. DuplicateConstructionDetector: computes lineage similarity scores and invokes a large‑model verifier to confirm true duplicates at field and table levels.
Intelligent DQC
DQC is divided into:
Rule‑based checks : automatically generated from configurable templates (e.g., null‑value ratios, primary‑key uniqueness).
Business‑logic checks : model‑driven rules such as exposure > clicks or sum(subsidy) = total_subsidy. Violations trigger immediate pipeline blocking.
Intelligent Customer Service (RAG)
The RAG pipeline works as follows:
User submits a free‑text SQL query.
The parsing engine converts the query into structured metadata (target table, fields, lineage).
Structured metadata is used to construct a prompt that retrieves relevant passages from the knowledge base (white‑paper, metadata, policy docs).
A large‑model agent generates a concise answer.
In production the system achieved 93% user acceptance and 90% answer accuracy for table/field queries.
Results
Naming‑standard violations reduced by 90%.
Duplicate‑construction detection accuracy reached 90%, identifying >900 duplicate cases for remediation.
DQC configuration coverage increased from 33% to 90%.
AI‑generated code coverage reached 20.83% of total SQL assets.
Customer‑service query accuracy maintained at 90% with 93% acceptance.
Overall developer productivity improved by 11.34%.
Future Roadmap
Modeling : develop AI‑driven evaluation metrics for high‑cohesion, low‑coupling models and automate mapping from business requirements to metric‑dimension matrices.
Quality : add automated DQC inspection, risk‑based alert classification, and smart testing (primary‑key consistency, row‑count drift, metric drift).
Customer Service : extend the chatbot to handle business‑logic questions, metric inconsistencies, and data‑visibility issues.
Key Q&A Highlights
Q1: For organizations with weak data‑governance, start by establishing naming standards and a minimal review workflow, then gradually introduce AI‑assisted reviews.
Q2: Store term roots and historical code in the knowledge base; ingest raw SQL and execution plans via daily incremental updates.
Q3: Combine robust standards with AI assistance; prioritize high‑value assets in the knowledge base to achieve high accuracy.
Q4: Maintain the data‑white‑paper and knowledge base through monthly manual updates and automated capture of new development artifacts.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
