From Hand‑Written SQL to One‑Click Validation: Alibaba’s Verify‑Data Agent Skill Design Review
The article details how Alibaba’s production‑grade Verify‑Data Agent Skill replaces manual, multi‑SQL data validation with a single natural‑language command, automating table discovery, SQL generation, execution, and review‑level reporting, achieving up to 30‑minute turnaround, comprehensive coverage, and robust risk controls for big‑data pipelines.
Background and Pain Points
Data development teams often verify tables manually by writing several SQL statements, executing them one by one, and visually comparing results. This approach suffers from limited coverage, baseline‑table selection errors, code‑understanding gaps, lack of evidential support, and high maintenance cost.
Opportunity for an Agent Skill
Since 2025, AI agents have been used for code generation and ops automation, but data‑validation scenarios still lack systematic solutions. The goal is to create an Agent Skill that, with a single natural‑language request, can fetch metadata, discover baseline tables, generate verification SQL, run it, and publish a review‑grade report.
What is Verify‑Data?
Verify‑Data is an end‑to‑end data‑validation Agent Skill. Users provide a research table name, and the agent automatically performs:
Information collection
Baseline‑table discovery
Code‑logic analysis
SQL generation (10 standardized templates)
Execution (batch strategy)
Report assembly
Document publishing
The entire flow is triggered by a single sentence and produces a structured, review‑level evidence package.
Core Capabilities
Baseline‑table automatic discovery : Two‑stage strategy—lineage‑based candidate set followed by metric‑based scoring (score = lineage × 0.5 + dimension × 0.3 + metric × 0.2). Top‑1 score ≥ 0.7 selects a baseline; otherwise multi‑baseline or downgrade strategies apply.
10 verification SQL templates : Include total‑volume comparison, data‑quality checks, dimension‑by‑dimension comparison, CUBE integrity, aggregation consistency, logical relationship checks, historical trend comparison, and association‑inflation detection.
Code‑Diff driven risk scanning : In iterative‑validation (S2) scenarios, the agent diff‑analyzes DEV and PROD code, detects eight risk signals (e.g., JOIN inflation, string‑numeric comparison, NULL assumptions, date window mismatches), and generates quantitative V‑class SQL to prove or refute each risk.
Downgrade strategies : When no suitable baseline exists, the agent falls back to four predefined strategies, each enforcing mandatory code review and V‑class verification to avoid false confidence.
Three mandatory red lines : (1) All verification SQL must be template‑generated; (2) Field‑mapping cannot rely on name guessing; (3) Downgrade paths must include code‑review and quantitative proof.
Workflow Details (7‑9 steps, up to 17 with conditional branches)
Step 1 Collect information → [Step 1.5 Baseline discovery]
Step 2 Get table schema → [Step 2.5 Partition pre‑check]
Step 3 Retrieve research table code → [Step 3.5 Structured diff] (forced) → [Step 3.6 Baseline applicability] (user‑specified) → [Step 3.7 CUBE detection] (JOIN present)
Step 4 Analyze dimension/metric mapping → [Step 4.5 Dimension matching] → [Step 4.8 Baseline vs research logic comparison] (forced)
Step 5 Generate verification SQL (10 templates) → [Step 5.5 Downgrade strategy] (fallback)
Step 6 Execute SQL (three‑batch strategy)
Step 7 Assemble local report
Step 8 Publish collaborative document
Step 9 Root‑cause analysis (triggered on FAIL)Key Conditional Steps Often Missed
Step 3.6 – Baseline applicability pre‑check (forces similarity calculation;
similarity = overlapping_dimensions / max(research_dims, baseline_dims)).
Step 3.7 – CUBE detection on dimension tables (detects one‑to‑many relationships via GROUP BY key HAVING COUNT(1) > 1).
Step 4.8 – Detailed baseline vs research logic comparison to prevent mismatched metrics.
Real‑World Scenarios
Scenario 1: New CUBE Table Launch (S1)
Trigger:
Help me validate my_project.ads_activity_cube_1d, partition 20260419Agent discovers three candidate baselines (scores 0.56, 0.56, 0.59), selects multi‑baseline strategy, generates 9 baseline‑comparison SQL and 5 downgrade SQL, runs them, and produces a PASS report.
Scenario 2: DEV vs PROD Mismatch (S2)
Trigger:
Compare my_project_dev.ads_campaign_1d and my_project.ads_campaign_1dAgent detects two new fields, one JOIN change, one filter change, flags one high‑risk signal (JOIN type change), generates V‑class SQL for each change, and outputs a detailed diff report with expected and unexpected differences.
Scenario 3: Completely New Metric (S1.c)
No baseline found; agent falls back to downgrade strategy, runs CUBE consistency checks, DWD upstream comparison, and data‑quality checks, resulting in a PASS with sub‑percent differences.
Scenario 4: Dimension Table Validation (DIM)
Agent identifies a dimension table (prefix dim_), runs static checks: primary‑key uniqueness, NULL rate, business‑rule consistency, tag distribution, and data completeness, yielding a WARNING due to missing runtime permissions.
Design Principles and Red Lines
Never skip template‑generated SQL.
Never infer field mapping without schema and code analysis.
Never produce a report with only D‑class SQL; V‑class proof is mandatory.
All JOINs must be checked for inflation; all date dimensions must pass association checks.
Conclusion Judgment System
Three outcomes: PASS (all SQL pass, no risk), WARNING (main items pass but conditional issues exist), FAIL (critical SQL fails or unexplained differences). The system combines D‑class consistency results with V‑class logical proof to decide the final status.
Current Challenges
Execution time and interaction cost (15‑30 min with multiple confirmations).
Permission fragmentation across project spaces causing workflow interruptions.
Trust in downgrade conclusions requiring extra explanation.
Steep onboarding for new users due to module installation and configuration.
Future Directions
Extreme UX : Asynchronous execution, silent mode, and cached metadata to turn validation into a “no‑feel” background task.
Platform Integration : Embed verification into release pipelines, pre‑check permissions, and block deployments on failure.
Intelligent Evolution : Full‑chain root‑cause localization, confidence scoring for downgrade outcomes, and incremental validation for monitoring scenarios.
Appendix
Common trigger phrases (one‑line commands) are listed in the article’s appendix.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
