How Data Agents Transform Data Querying: Semantic Layer Integration and Decision‑Making (Part 1)
This article details the engineering journey of building enterprise‑grade Data Agents, covering the semantic‑layer integration that resolves NL‑to‑SQL inconsistencies, the skill‑based architecture that enables query, attribution, forecasting and cash‑flow actions, and the final multiplication formula that defines success in deep‑water AI‑driven decision making.
Phase 1: Exploring the Semantic Layer
With the surge of large language models (LLM) and generative AI, enterprises increasingly demand natural‑language data access for decision support. Direct NL‑to‑SQL on raw tables leads to core contradictions such as "same question, different answer", ignorance of business definitions, hallucinations, uncontrolled permissions, and unexplainable results.
We identify three conflict categories:
LLMs are "smart" but lack knowledge of internal business metric definitions.
Business analysis requires deterministic numbers, yet LLM outputs are probabilistic and vary across calls.
Although natural‑language lowers the query barrier, unstable results force extensive IT back‑and‑forth verification, negating efficiency gains.
To address these issues, the industry adopts a "metric semantic layer" as the foundation for intelligent querying. By pre‑defining metric logic and dimensions, the model first understands the business intent, then generates a metric query language (MQL) request instead of raw SQL, ensuring semantic consistency, accurate SQL generation, flexible composition, and controlled permissions.
The overall architecture splits the pipeline into two parts: the LLM handles user interaction, intent understanding, and result explanation; the semantic query engine performs stable, accurate, and secure data retrieval.
Concrete workflow:
User asks a question.
The LLM matches the request to known metrics, dimensions, and filters using the semantic layer.
An MQL request is built and sent to the engine.
The engine checks permissions, translates MQL to SQL, and executes the query.
The LLM receives the result, interprets it, and presents charts or tables.
The technical "iron triangle" that guarantees 100 % accuracy consists of:
Logical metric definition – pre‑defined metric formulas ensure business meaning.
Dynamic SQL assembly – mapping metric components to SQL templates with optimization (e.g., Archer DB full‑text index).
Result explainability – the system surfaces the metric, dimensions, filters, and calculation logic for user verification.
Underlying guarantees include data security (row/column level permissions) and query performance.
Technical Solution Overview
The data‑warehouse Data Agent pipeline follows: NL → keyword extraction → ES/vector retrieval → LLM generates MQL → rule post‑processing → validation → SQL compilation → execution → explanation. A ReAct loop (think‑act‑observe) lets the LLM use short‑term context (previous round results) and long‑term knowledge (semantic layer, business rules) to plan actions, invoke tools, and iterate until a complete answer is produced.
Phase 2: From Query to Decision
Most enterprise agents stall at low‑complexity tasks (emails, simple forms) that bypass core data assets. By integrating the open‑source OpenClaw runtime with the Qilin Cloud Data Warehouse (CDW) semantic layer, we injected a hierarchical skill set that moves agents from basic querying to active diagnosis and decision loops.
Skill 1 – metric‑query (Query Skill)
Translates natural language into MQL requests. Before each query it validates metric and dimension existence, eliminating hallucinated fields. Supported capabilities include basic queries, period‑over‑period comparisons, ratios, rankings, dimension filters, result filters, temporary metric definitions, time constraints, and multi‑level aggregations.
Skill 2 – metric‑attribution (Attribution Skill)
Implements a five‑step diagnostic flow for "why did it drop" questions. Each step calls metric‑query for data, then performs local attribution calculations. Simple queries use only metric‑query; complex root‑cause analyses trigger the full attribution pipeline.
Experiment: Cash‑Flow Health Diagnosis
We simulated a real e‑commerce merchant’s cash‑flow health check using six skill‑driven rounds:
Global anomaly scan : Agent fetched recent balances of "available funds", "pending settlement", and "cash‑turnover days". Result – all metrics within 3σ, turnover days ~12 days (healthy).
Trend extrapolation : Forecasted 30‑day cash trajectory based on daily consumption, predicting a decline to ¥1.5 M.
What‑if stress test : Simulated a 30 % spend surge during a mid‑year promotion. If repayment stays unchanged, cash after 30 days remains above the safety line; with a 3‑day repayment delay, cash falls to ¥84 k, indicating breach risk.
High‑dimensional drill‑down : Split the global average, revealing that "platform pending settlement" has a 45‑day turnover (red) while other channels remain healthy.
Report generation & scheduling : Agent assembled an HTML cash‑flow health report and scheduled a weekly rerun (cron 0 10 * * 1).
Actionable strategy : Agent produced a structured action plan (e.g., accelerate receivables, apply for supply‑chain financing) with priority tags and expected cash‑recovery amounts.
These rounds demonstrate the transition from raw numbers to precise risk locations and concrete business recommendations.
Industry Knowledge Graph & Decision Loop
Beyond diagnosis, the "inventory‑strategy" skill acts as a prescribing engine. It reuses lower‑level skills but adds three layers of domain‑specific decision logic, turning a diagnostic report into executable strategy commands (e.g., financing requests, channel‑level credit limits).
Cash‑Flow Activation Example
When asked to "activate cash flow for ¥12 M pending funds", the agent performed:
Full scan & channel identification : Detected total pending ¥12.56 M, overall collection rate 32 %, and a dominant "B2B large‑client distribution" channel consuming 65 % of funds.
Quadrant projection & risk correction : Although the channel shows strong gross profit, its DSO exceeds 120 days, triggering a risk engine to downgrade it to a bad‑debt alert.
Human‑in‑the‑loop confirmation : Agent asked for explicit approval before issuing collection or financing instructions.
Structured action output : Generated a table of recommended actions (e.g., 3 % discount for rapid repayment, financing requests) with priority levels and expected recovery amounts.
Final Multiplication Formula
From the successful experiments we distilled a multiplication model for enterprise‑grade Data Agents. Any missing or weak factor causes exponential degradation of business impact. The four decisive factors are:
Deterministic entropy reduction (consistent metric paths under the semantic layer).
Context decoupling (LLM does not need full schema in its prompt).
Architectural barrier (single semantic‑layer mapping shields agents from underlying schema changes).
Depth of Skill ecosystem (number and sophistication of domain‑specific skills).
Thus, the semantic layer is not a substitute for AI but a prerequisite; the true competitive moat lies in the richness of the Skill library.
Conclusion
LLMs provide general knowledge, but without a well‑engineered semantic layer and a deep Skill framework, agents remain shallow. By reducing randomness, decoupling context, and protecting against schema drift, Data Agents can evolve from reporting tools to precise, action‑oriented business weapons.
The next article will cover Phase 3 (ontology‑level semantic layer) and Phase 4 (team technical progress).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
