From Skill to Ontology: Building a Trustworthy Data Agent Semantic Layer

The article analyzes why expanding the Skill system with an ontology‑based semantic layer is essential for Data Agents, comparing metric‑centric and ontology‑centric approaches, outlining technical evolution from NL2SQL to NL2LF2SQL, and proposing a step‑by‑step implementation roadmap for enterprises.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
From Skill to Ontology: Building a Trustworthy Data Agent Semantic Layer

Stage 3: From Skill to Ontology‑Based Semantic Layer

In the previous stages, Skills enabled the Agent to move from simple data retrieval to decision‑making, but as the number of Skills grows, maintaining consistent business rules across scattered prompts becomes impossible.

The core limitation of Skills is the material they can access. When business rules are embedded in dozens of Skill prompts, the same metric (e.g., "sales") may be calculated differently across contexts, and adding a new dimension (e.g., a sales channel) requires updating multiple prompts.

1. Skill Capability Boundary – Why "Finding Data" ≠ "Understanding Business"

Skills turn an Agent into a junior analyst capable of anomaly detection, trend prediction, quadrant diagnosis, and recommendation. However, the ceiling of a Skill is determined by the data it can fetch.

Using a doctor analogy, Skills are the diagnostic process while the semantic layer is the lab report. If the report’s definitions, dimensions, and rules are scattered across prompts, inconsistencies arise (different definitions of "sales", varying criteria for "new customer", etc.).

2. Two Types of Semantic Layers

2.1 Metric‑Centric Semantic Layer (Stages 1‑2)

Focuses on "how is this number calculated?" – metric, dimension, and definition.

Solves standard, high‑frequency query scenarios with high ROI.

Struggles when questions shift from "what" to "why", requiring business context beyond pure metrics.

2.2 Ontology‑Based Semantic Layer

Models objects (customer, order, product, store, channel, warehouse, organization), events (order, payment, shipment, refund, inventory movements), relationships (customer places order, order contains product, product participates in campaign, campaign runs on channel), and rules.

Answers not only "how the number is computed" but also "which business fact it represents", "which objects and relationships are involved", "how a change propagates through the business chain", and "what actions to take next".

2.3 Six‑Dimension Comparison

Modeling Unit : Metrics vs. objects/events/relationships/rules.

Expression Capability : Aggregation only vs. aggregation + cross‑object filtering + event chains + state changes + dynamic grouping.

Problem Coverage : Standardized queries vs. query → attribution → explanation → recommendation → action.

Explainability : "How the metric is derived" vs. "Why these objects and events caused the result".

Maintenance Focus : Metric pool & formulas vs. object models, event chains, relationship graphs, explicit rules.

Ceiling : Limited by how many problems can be expressed as metrics vs. limited by how completely the business world can be modeled.

3. Why Ontology‑Based Semantic Layer Is the New Foundation for Data Agents

3.1 Industry Consensus (2026)

Microsoft Fabric IQ – treats ontology as a preview capability binding entities, properties, and relationships to data assets.

Snowflake Cortex Analyst – emphasizes Semantic Views.

Databricks Genie – requires domain experts to maintain datasets, sample queries, guidelines, and a knowledge store.

dbt Semantic Layer – continues to push metric definitions and semantic graphs into a unified layer.

Google Looker – exposes its semantic layer to Gemini CLI and other Agents via MCP.

Key Insight: Large models must rely on a machine‑readable, governable, reusable business semantic base rather than guessing.

3.2 Traditional Data Warehouses Fill Gaps with Human Knowledge

Analysts historically rely on tacit knowledge (e.g., which table holds month‑end definitions, how "new customer" is defined). Agents lack this implicit context, leading to hallucinations when material definitions are missing or inconsistent.

3.3 Prompt‑Only Approaches Are Like Taping Over a Weak Foundation

Typical projects inject a schema, add a prompt, iterate on errors, and layer department‑specific rules and permissions. This may work for short demos but fails in production because prompts only control process flow, not the correctness of the underlying material.

3.4 Ontology‑Based Layer Provides the Agent’s "World Model"

Reduces hallucination sources by delivering clearer, structured, trustworthy material.

Constrains reasoning to defined objects, relationships, metrics, permissions, and actions.

Supports role‑specific views (finance, operations, sales) without bloating prompts.

Enables attribution and actionable recommendations beyond raw metric differences.

Integrates with RBAC, auditing, and data lineage for full traceability.

4. Technical Evolution Path: NL2SQL → NL2MQL2SQL → NL2LF2SQL

Direct NL2SQL on complex schemas suffers from accuracy loss as joins and definitions multiply.

Stage 1 introduced NL2MQL2SQL: an intermediate Metric Query Language (MQL) improves stability but still expresses only metrics, dimensions, conditions, and time.

Ontology‑based semantic layer advances to NL2LF2SQL, separating language understanding (NL) from structured execution (Logical Form). The model parses intent, the semantic layer enforces constraints and mappings, and a deterministic engine translates to SQL. Errors can be pinpointed to intent, mapping, metric definition, join path, or data quality.

5. Practical Enterprise Adoption Path

Instead of a full‑scale rollout, start with a high‑value domain:

Step 1: Identify Core Objects, Events, Relationships, and Metrics

Sales domain – objects: customer, order, product, campaign, channel, organization; events: order, payment, shipment, refund; metrics: GMV, gross profit, repurchase rate, average order value.

Clarify finance/operations/sales definitions and data access permissions.

Step 2: Build a Maintainable Semantic Model

Bind table fields to business concepts; version metrics and definitions.

Make object relationships explicit and link to role‑based permissions.

Expose key business rules for query chains and Agent runtime.

Step 3: Introduce an Intermediate Logical Form

Model converts user questions into a checkable business logic expression.

The deterministic engine maps the logical form to queries and actions.

Step 4: Establish an Evaluation Suite

Curate golden question sets, standard answers, follow‑up queries, edge cases, and regression tests.

Reference Snowflake verified examples and Databricks Inspect practices.

Step 5: Connect Harness Engineering on the Trusted Semantic Base

Plan, Tool Calling, MCP, Skill, Prompt, Guardrail – all become effective only when grounded on reliable material.

6. Three‑Layer Architecture Overview

The complete Data Agent architecture evolves from Stage 1 to Stage 3, integrating NL2LF2SQL, the ontology platform, and the Skill orchestration layer.

7. One‑Sentence Summary

Metric‑centric semantic layers let AI fetch numbers; ontology‑centric layers let AI understand the business. Without a unified factual base, Skills eventually fall into scattered definitions, duplicated rules, and unmaintainable knowledge.

Team Progress

Deployed Modules

Metric‑centric Data Agent – NL→MQL→SQL pipeline in production, supporting intelligent queries, unified definitions, permission checks, and explainable results.

Skill System – deployed for data retrieval, attribution, anomaly detection, trend forecasting, reporting, and scheduling.

Ontology Platform – core capabilities (object model, event model, relationship model, rule engine) completed; supports Logical Form and deterministic mapping.

Ongoing Incremental Roadmap

The team is progressively migrating business rules from Skill prompts to the ontology layer, turning Skills into lightweight orchestrators while the ontology becomes the trusted “new foundation”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AILLMSemantic LayerOntologyData InfrastructureData Agent
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.