Big Data 17 min read

How a Unified Metric Service Transforms Data Queries with Headless BI

Facing inconsistent metrics and low reuse in siloed data services, the team built a unified metric service using a headless BI semantic layer and virtual data models, enabling consistent metric definitions, reusable data models, AI-friendly queries, and faster, scalable reporting across the organization.

Youzan Coder
Youzan Coder
Youzan Coder
How a Unified Metric Service Transforms Data Queries with Headless BI

Background

Rapid business growth created many data products, but they suffered from siloed development, inconsistent metric definitions, low reuse, high maintenance cost, and unclear data lineage. To support AI‑driven flexible queries, a unified metric service was designed.

Design Goals

Consistent metric definitions across all services.

Reusable metric and data model definitions (define once, use many times).

Higher development efficiency and scalability.

AI‑friendly metadata and flexible query capabilities.

Existing Architecture and Drawbacks

The legacy stack consisted of:

OneService (OS) : table‑level data service built on MyBatis templates.

Data Service Unit : product‑level service that stitches multiple OS services.

Table‑Level Service Issues

Metrics and data source selection were hard‑coded in SQL templates, leading to:

Inconsistent metric calculations.

Low reuse of SQL fragments.

Complex, hard‑to‑maintain queries.

Missing data lineage.

SELECT
    SUM(xxx) AS PV,
    COUNT(DISTINCT xxx) AS UV,
    COUNT(DISTINCT xxx) AS NEW_UV,
    SUM(xxx) AS STAY_TIME_AVG,
    SUM(xxx) * 1.0 / COUNT(DISTINCT xxx) AS PER_CAPITA_BROWSING,
    COUNT(DISTINCT xxx) AS SHARE_TOTAL_UV,
    COUNT(DISTINCT xxx) AS ADD_CART_UV,
    COUNT(DISTINCT xxx) AS ORDER_UV,
    SUM(xxx) AS ORDER_AMOUNT,
    SUM(xxx) AS PAY_AMOUNT,
    COUNT(DISTINCT xxx) AS PAY_CNT,
    COUNT(DISTINCT xxx) AS ORDER_CNT
FROM dm.xxx
WHERE partition >= #{startDay}
  AND partition <= #{endDay}
  AND team_id = ${teamId}
  /* optional channel filters */

Business‑Centric Data Service Model

From a business perspective a data service should expose only three core elements:

Metric – what to measure.

Dimension – how to slice the metric.

Filter conditions – constraints such as date range or channel.

Underlying tables and APIs remain internal implementation details.

Metric Service Architecture

Headless BI and Semantic Layer

The semantic layer is decoupled from visualization, providing a unified, high‑performance metric interface for downstream consumers.

Metadata Management

Public dimensions and measures registry.

Data model registry (source, freshness, field mapping, priority).

Metric definitions (atomic, derived, composite) independent of physical tables.

Virtual Data Model

A virtual wide table is generated from public dimensions and measures. At query time the engine automatically selects matching physical models, builds a logical view, and executes cross‑source SQL with optimization.

Query Flow and DSL

Clients send a DSL containing metric list, dimension list, and filter parameters. The service:

Parses the DSL and determines required dimensions/measures.

Selects optimal physical models based on cost estimation and rule‑based priorities.

Constructs a logical view (virtual table) that abstracts the underlying physical tables.

Executes the query across heterogeneous engines, merges results, and returns a unified metric table.

Interface Example

Request example (JSON‑style DSL):

{
  "metrics": ["payment_amount", "visitor_count"],
  "dimensions": ["store"],
  "date_range": {"start": "2023-11-01", "end": "2023-11-09"},
  "filters": {"team_id": 123}
}

Response is a tabular result where each row corresponds to a dimension value and columns contain the requested metrics.

LLM Integration for Conversational Queries

An LLM parses natural‑language questions into the metric‑service DSL instead of generating raw SQL. This avoids hallucinations and guarantees that the generated query respects the unified metric definitions.

Example NLQ → DSL conversion:

NLQ: “Show payment amount, payment count and visitor count for each store on 2023‑11‑09.”

Generated DSL (as above) includes metrics, dimension=store, date=2023‑11‑09.

Benefits and Applications

Rapid report building : pre‑defined metrics and dimensions reduce report creation from days to hours.

Self‑service data extraction : merchants configure desired metrics and dimensions without seeing underlying tables.

Metric reuse and consistency : a single definition guarantees identical calculations across all downstream reports.

Development efficiency : new data services are assembled by composing metadata rather than writing new SQL.

Future Outlook

The platform currently focuses on metric definition, metadata‑driven virtual models, and query execution. Future work will extend the service to cover data production pipelines and automated materialization to further accelerate data availability.

big dataData ModelingLLM integrationHeadless BIMetric ServiceVirtual Data Model
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.