AI Coding Meets Data Warehousing: From Conversational Help to a Harness Pipeline
The article recounts how a data‑warehouse team built the Harness framework to turn AI‑generated SQL assistance into a fully engineered, end‑to‑end pipeline, addressing four key pain points—semantic drift, precision, rollback cost, and SLA constraints—through a seven‑layer architecture, skill registry, state persistence, and evidence‑based human‑in‑the‑loop checks.
Problem and Pain Points
Internal experiments showed that large language models (LLMs) could generate syntactically correct SQL (≈90%) but failed to meet production‑grade requirements. When strict constraints were added – primary‑key uniqueness, no SELECT *, partition naming, metric drift < 0.5%, and CRC32 fingerprint consistency – the pass rate dropped to 8.6%, and to 0% when rollback SQL, data‑quality checks (DQC), back‑fill scripts and change‑log entries were required.
The failures stem from four data‑warehouse‑specific pain points:
Cross‑layer semantic drift : definitions change from ODS → DWD → DWS and the model “guesses” incorrectly at each layer.
Metric‑level sensitivity : a single LEFT JOIN vs INNER JOIN or an off‑by‑one date can break downstream reports.
Rollback cost asymmetry : fixing a 50 GB table may require terabytes of re‑processing across dozens of downstream tables.
SLA hard constraints : tables must be ready by 04:00 am; any delay forces overnight firefighting.
LLMs lose long‑range constraints because their context window compresses older tokens. The solution is to move “guardrails” out of the model and into a deterministic engineering framework called Harness .
Harness Overview – Seven‑Layer Defensive‑by‑Default Skeleton
Harness treats the LLM as a creative engine while the framework enforces constraints, validation and accountability. Each layer assumes the LLM may err and provides a fallback:
L1 – Master Orchestrator : explicit state machine, stage gates and cross‑session continuation to prevent drift in long processes.
L2 – SKILL Registry : pre‑loaded markdown .md skill files with triggers, boundaries, resources, steps and known pitfalls.
L3 – MCP Tool Bus : sandboxed adapters for SQL, TAPD, Feishu, Tableau so the LLM never touches production data directly.
L4 – Human‑in‑the‑Loop (HITL) Gates : five ASK checkpoints placed where the cost of fixing later is highest.
L5 – State Persistence : a single source of truth JSON file ( _state.json) plus an artifact file tree to survive context loss.
L6 – Anti‑Pattern Library : catalog of historical AI mistakes (AP‑001…AP‑024) injected into the prompt to avoid repeat errors.
L7 – Mandatory Self‑Check : evidence‑based Q&A that forces the model to provide concrete proof before advancing.
Five Crucial Design Details
1. SKILL Registry – Why Not IDE Hooks?
IDE hooks can only enforce surface‑level rules after the file is written and provide no explanation to the model. Instead, each SKILL is a markdown file (e.g., dw-dqc-validation.md) containing:
Trigger keywords (e.g., "DQC", "PK unique")
Capability boundaries (what the skill can and cannot do)
Dependent resources (required MCP adapters)
Execution steps (six DQC SQL templates)
Required artifacts (validation report + SQL files)
Common pitfalls (negative examples)
The Master reloads the markdown on demand, guaranteeing that the LLM sees the full specification even after context compression.
2. State Machine – From TODO List to Persistent JSON
Data‑warehouse projects typically span 4–6 chat sessions. To keep progress across sessions, the cursor‑based TODO list was replaced by a persistent state file and a Git‑tracked artifact tree. Example _state.json:
{
"req_id": "Indonesia three‑party log sync",
"phase": "M3_done",
"feishu_enabled": true,
"impacted_cols": ["business_type", "third_party_scene_code"],
"unchanged_cols_fp": "crc32_sum=8843718233",
"hitl_gates_passed": ["S1","S5","M1","M3"],
"hitl_gates_pending": ["T1"],
"current_master_skill_version": "v1.6",
"anti_pattern_hits": ["feishu_changelog_skipped"],
"last_updated": "20260601T17:42:00+08:00"
}The Master reads this file at the start of every session, reconstructing the full workflow regardless of chat history.
3. HITL Gates – Minimal‑Cost Human Intervention
Five ASK checkpoints (G1–G5) are placed where the “decision value × correction cost” ratio is optimal:
G1 (after requirement analysis) : human confirms that the interpreted requirement matches the original.
G2 (after DDL & mapping) : human validates the DDL and dependency graph.
G3 (after code blueprint) : human approves the five code artifacts (DDL, ETL, backfill, rollback, DQC).
G4 (after self‑testing) : human checks that dev vs prod CRC32 fingerprints are identical.
G5 (after delivery testing) : human verifies that the Feishu changelog URL and revision ID are recorded.
Each gate requires concrete evidence (e.g., CRC32 sum, query IDs, URLs) before the model can proceed.
4. Anti‑Pattern Library – Teaching the Model Its Own Mistakes
Each entry records error code, scenario, observed AI output, root cause and mitigation. Representative examples:
AP‑007 : AI guessed the meaning of userStatus incorrectly; mitigation – require explicit enum definition from business owners.
AP‑013 : Adding columns to DWT broke downstream SELECT * statements; mitigation – run impact analysis via dw‑impact‑analysis skill.
AP‑019 : AI skipped the Feishu changelog step; mitigation – mandatory self‑check question and anti‑pattern entry.
When a skill is invoked, relevant AP entries are injected into the LLM context, reducing repeat error rate from 47% to 6%.
5. Mandatory Self‑Check – Evidence‑Based Q&A
At the final delivery stage (T1) the model must answer five concrete questions, each demanding proof:
Q1: Are all six DQC rules PASS? – provide <code>lumina-sql</code> query IDs, status=FINISHED and row counts.
Q2: Do existing fields have zero side‑effects? – provide dev vs prod CRC32 fingerprint sum for the partition.
Q3: Is the change record written to Feishu? – provide <code>documentRevisionId</code> and URL.
Q4: Is the historical back‑fill script complete? – provide script path and covered partition range.
Q5: Have downstream owners been notified? – provide DataMap query results and notification method.This eliminates “virtual done” reports; after deployment the false‑positive completion rate dropped to near zero.
End‑to‑End Case Study: Indonesia Three‑Party Log Sync
The team delivered a 1,800‑row wide table with 500+ existing fields and two new metrics, back‑filled from 20260525. Total wall‑clock time was 47 minutes (33 min AI work, 14 min human evidence review). Key observations:
AI phases (S1–T1) averaged 2–5 minutes each.
Human review focused on G2 and G3 gates (DDL & code blueprint), consuming the bulk of the 14 minutes.
Overall effort dropped from ~2 person‑days to 47 minutes (≈25× speed‑up) while guaranteeing zero side‑effects.
Zero side‑effects were verified with a CRC32 fingerprint query:
SELECT COUNT(*) AS row_cnt,
SUM(CRC32(CONCAT_WS('|',
COALESCE(CAST(uuid AS STRING), ''),
COALESCE(CAST(user_id AS STRING), ''),
...
COALESCE(CAST(create_time AS STRING), ''
))) AS fp_sum
FROM ${target_table}
WHERE dt = '${p_date}';Result: dev and prod tables produced identical fp_sum=8843718233 for the 500+ existing fields, confirming no unintended changes.
ROI and Quality Metrics
Comparing manual effort (average 2–3 hours per stage) with Harness‑assisted effort (minutes) yields an overall time reduction of about 94%:
Requirement analysis: 2–3 h → 3 min + 5 min gate (≈95% reduction)
Upstream profiling: 1–2 h → 4 min (≈92% reduction)
Model design: 2–4 h → 3 min + 5 min gate (≈94% reduction)
ETL coding: 2–3 h → 4 min + 8 min gates (≈90% reduction)
Self‑testing & DQC: 3–5 h → 6 min + 6 min gates (≈92% reduction)
CR & release: 1–2 h → 4 min + 4 min gates (≈87% reduction)
Delivery testing: 1–2 h → 5 min (≈92% reduction)
Total : ≈14 h → 47 min (≈94% reduction)
Zero‑side‑effect success rate rose from 38% (pre‑Harness) to 96% after version v1.6; the remaining 4% involve cases where the CRC32 fingerprint definition does not apply.
Productivity impact: a ten‑engineer team (≈112 h of monthly demand work) saved roughly 10 person‑months per month, freeing engineers for technical debt reduction, architecture upgrades and higher‑value analytics.
Design Evolution and Trade‑offs
Four cognitive pivots guided the evolution:
From AI‑generated SQL to AI‑delivered requirement : the goal shifted from producing a .sql file to delivering a complete, production‑ready artifact bundle.
From prompt‑only to framework‑backed : soft prompt constraints cannot guarantee 100% correctness; deterministic engineering layers provide hard guarantees.
From single‑session to cross‑session state machine : persistent JSON state eliminates context loss across the typical 4–6 chat sessions of a data‑warehouse task.
From prompt engineering to Harness engineering : the focus moved from “make the AI do the right thing now” to “build a collaborative scaffold that keeps the AI reliable over long, multi‑tool workflows.”
Discarded ideas illustrate the guiding principle “maximise human‑machine collaboration, not raw AI capability”:
Automatic AI‑driven production deployment – removed because responsibility must stay with humans.
Direct user‑to‑AI requirement dialogue – removed because product owners cannot answer low‑level clarification questions in real time.
Multi‑agent racing – removed because coordination cost outweighed any marginal quality gain.
Future Directions
Agent‑in‑Agent architecture : split the monolithic Master into specialized sub‑agents (verifier, reporter, etc.) for better isolation.
Visual Harness Console : replace the raw _state.json with a web UI showing progress, gate status and artifact previews.
Cross‑domain reuse : extend the skill library to finance, marketing and other high‑sensitivity domains by adding domain‑specific sub‑skills.
Harness demonstrates that AI can accelerate data‑warehouse development, but only when wrapped in a rigorous engineering scaffold that enforces constraints, provides transparent evidence and keeps humans in the decision loop.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
