Big Data 28 min read

From Hand‑Written SQL to One‑Click Validation: Alibaba’s Verify‑Data Agent Skill Design Review

The article details how Alibaba’s production‑grade Verify‑Data Agent Skill replaces manual, multi‑SQL data validation with a single natural‑language command, automating table discovery, SQL generation, execution, and review‑level reporting, achieving up to 30‑minute turnaround, comprehensive coverage, and robust risk controls for big‑data pipelines.

dbaplus Community
dbaplus Community
dbaplus Community
From Hand‑Written SQL to One‑Click Validation: Alibaba’s Verify‑Data Agent Skill Design Review

Background and Pain Points

Data development teams often verify tables manually by writing several SQL statements, executing them one by one, and visually comparing results. This approach suffers from limited coverage, baseline‑table selection errors, code‑understanding gaps, lack of evidential support, and high maintenance cost.

Opportunity for an Agent Skill

Since 2025, AI agents have been used for code generation and ops automation, but data‑validation scenarios still lack systematic solutions. The goal is to create an Agent Skill that, with a single natural‑language request, can fetch metadata, discover baseline tables, generate verification SQL, run it, and publish a review‑grade report.

What is Verify‑Data?

Verify‑Data is an end‑to‑end data‑validation Agent Skill. Users provide a research table name, and the agent automatically performs:

Information collection

Baseline‑table discovery

Code‑logic analysis

SQL generation (10 standardized templates)

Execution (batch strategy)

Report assembly

Document publishing

The entire flow is triggered by a single sentence and produces a structured, review‑level evidence package.

Core Capabilities

Baseline‑table automatic discovery : Two‑stage strategy—lineage‑based candidate set followed by metric‑based scoring (score = lineage × 0.5 + dimension × 0.3 + metric × 0.2). Top‑1 score ≥ 0.7 selects a baseline; otherwise multi‑baseline or downgrade strategies apply.

10 verification SQL templates : Include total‑volume comparison, data‑quality checks, dimension‑by‑dimension comparison, CUBE integrity, aggregation consistency, logical relationship checks, historical trend comparison, and association‑inflation detection.

Code‑Diff driven risk scanning : In iterative‑validation (S2) scenarios, the agent diff‑analyzes DEV and PROD code, detects eight risk signals (e.g., JOIN inflation, string‑numeric comparison, NULL assumptions, date window mismatches), and generates quantitative V‑class SQL to prove or refute each risk.

Downgrade strategies : When no suitable baseline exists, the agent falls back to four predefined strategies, each enforcing mandatory code review and V‑class verification to avoid false confidence.

Three mandatory red lines : (1) All verification SQL must be template‑generated; (2) Field‑mapping cannot rely on name guessing; (3) Downgrade paths must include code‑review and quantitative proof.

Workflow Details (7‑9 steps, up to 17 with conditional branches)

Step 1   Collect information → [Step 1.5 Baseline discovery]
Step 2   Get table schema → [Step 2.5 Partition pre‑check]
Step 3   Retrieve research table code → [Step 3.5 Structured diff] (forced) → [Step 3.6 Baseline applicability] (user‑specified) → [Step 3.7 CUBE detection] (JOIN present)
Step 4   Analyze dimension/metric mapping → [Step 4.5 Dimension matching] → [Step 4.8 Baseline vs research logic comparison] (forced)
Step 5   Generate verification SQL (10 templates) → [Step 5.5 Downgrade strategy] (fallback)
Step 6   Execute SQL (three‑batch strategy)
Step 7   Assemble local report
Step 8   Publish collaborative document
Step 9   Root‑cause analysis (triggered on FAIL)

Key Conditional Steps Often Missed

Step 3.6 – Baseline applicability pre‑check (forces similarity calculation;

similarity = overlapping_dimensions / max(research_dims, baseline_dims)

).

Step 3.7 – CUBE detection on dimension tables (detects one‑to‑many relationships via GROUP BY key HAVING COUNT(1) > 1).

Step 4.8 – Detailed baseline vs research logic comparison to prevent mismatched metrics.

Real‑World Scenarios

Scenario 1: New CUBE Table Launch (S1)

Trigger:

Help me validate my_project.ads_activity_cube_1d, partition 20260419

Agent discovers three candidate baselines (scores 0.56, 0.56, 0.59), selects multi‑baseline strategy, generates 9 baseline‑comparison SQL and 5 downgrade SQL, runs them, and produces a PASS report.

Scenario 2: DEV vs PROD Mismatch (S2)

Trigger:

Compare my_project_dev.ads_campaign_1d and my_project.ads_campaign_1d

Agent detects two new fields, one JOIN change, one filter change, flags one high‑risk signal (JOIN type change), generates V‑class SQL for each change, and outputs a detailed diff report with expected and unexpected differences.

Scenario 3: Completely New Metric (S1.c)

No baseline found; agent falls back to downgrade strategy, runs CUBE consistency checks, DWD upstream comparison, and data‑quality checks, resulting in a PASS with sub‑percent differences.

Scenario 4: Dimension Table Validation (DIM)

Agent identifies a dimension table (prefix dim_), runs static checks: primary‑key uniqueness, NULL rate, business‑rule consistency, tag distribution, and data completeness, yielding a WARNING due to missing runtime permissions.

Design Principles and Red Lines

Never skip template‑generated SQL.

Never infer field mapping without schema and code analysis.

Never produce a report with only D‑class SQL; V‑class proof is mandatory.

All JOINs must be checked for inflation; all date dimensions must pass association checks.

Conclusion Judgment System

Three outcomes: PASS (all SQL pass, no risk), WARNING (main items pass but conditional issues exist), FAIL (critical SQL fails or unexplained differences). The system combines D‑class consistency results with V‑class logical proof to decide the final status.

Current Challenges

Execution time and interaction cost (15‑30 min with multiple confirmations).

Permission fragmentation across project spaces causing workflow interruptions.

Trust in downgrade conclusions requiring extra explanation.

Steep onboarding for new users due to module installation and configuration.

Future Directions

Extreme UX : Asynchronous execution, silent mode, and cached metadata to turn validation into a “no‑feel” background task.

Platform Integration : Embed verification into release pipelines, pre‑check permissions, and block deployments on failure.

Intelligent Evolution : Full‑chain root‑cause localization, confidence scoring for downgrade outcomes, and incremental validation for monitoring scenarios.

Appendix

Common trigger phrases (one‑line commands) are listed in the article’s appendix.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData QualityData ValidationSQL GenerationAgent Automation
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.