Big Data 7 min read

How to Build a Near‑Real‑Time Metric Management System with Flink, Kafka, and Trino

This article outlines the design and implementation of a near‑real‑time metric management platform at Liulishuo, detailing its data flow—from Kafka ingestion through Flink‑SQL processing into Hudi tables, Trino querying, metric configuration, lineage, visualization, alerting, scheduling, and future optimization plans.

Liulishuo Tech Team
Liulishuo Tech Team
Liulishuo Tech Team
How to Build a Near‑Real‑Time Metric Management System with Flink, Kafka, and Trino

Background

In Liulishuo's offline data‑warehouse development, warehouse engineers maintain shared dimension and detail tables while analysts compute downstream data. Real‑time metrics add complexity beyond offline calculations, making it difficult for analysts without development skills to perform flexible analysis. To address this, a near‑real‑time metric management system was built.

Engineering Practice

The overall data‑flow architecture consists of:

Data Flow Diagram
Data Flow Diagram

Metric query process:

Business and log data are centralized in Kafka.

Flink SQL consumes Kafka streams and writes to Hudi COW tables.

The near‑real‑time system dispatches jobs to Trino according to configured rules.

Trino queries the Hudi tables and returns results.

Metric Production and Maintenance

3.1 Configuring Basic Metric Information

After data sources are ready, analysts define aggregation logic directly with Trino SQL. Shared data‑cleaning logic can be encapsulated in views created by warehouse engineers and referenced by metric definitions.

Additional metadata required for each metric includes:

Metric unit

Metric tags

Domain

Business process

Calculation grain

Other extensible information

For non‑analyst creators, an approver must be designated to validate metric reasonableness. Creators can also set an initialization start date to back‑fill historical data.

Metric Creation Diagram
Metric Creation Diagram

Each metric receives a unique code after configuration. This code can be referenced in Aviator expressions to build composite metrics, which can be reused in further composites.

Example:

-- Aggregated metric
SELECT COUNT(DISTINCT xxx)
FROM DB.TABLE
WHERE xxx;

-- Composite metric
${3gdlYrl7} / ${FMxGYvw3}

3.2 Metric Lineage

The chain of metric references forms a lineage graph. When an abnormal metric value is detected, the lineage helps trace back to the problematic source data.

Metric Lineage Diagram
Metric Lineage Diagram

3.3 Metric Visualization

Creators can configure view permissions for each metric. Users with permission can customize their watch list, order, and highlight colors, enabling flexible analysis.

Metric Dashboard
Metric Dashboard

3.4 Metric Q&A

Users with view rights can post questions on a metric’s Q&A page. Creators answer them, and the answers are visible to all, reducing communication overhead.

Metric Q&A
Metric Q&A

3.5 Configuring Alerts

After a metric goes live, creators can set alert rules using Aviator expressions and placeholders for N‑day historical data. When the expression evaluates to a negative value, the system triggers an alert via pre‑configured channels such as email or webhook.

Alert Configuration
Alert Configuration

Scheduling and Computation

The near‑real‑time metric job runs every five minutes; each hour’s results overwrite the previous hour’s data. Results are stored in MySQL for user queries. After each computation, data is batch‑written to avoid temporary inconsistencies during reads.

Upon completion, a trigger ends the scheduling cycle and initiates alert handling.

If data re‑processing or supplemental data for new metrics is required, a re‑run task is submitted to a monitoring thread, which then executes the necessary back‑fill.

Outlook

Current limitations include the lack of persisted public‑layer data for near‑real‑time metrics, leading to redundant computation when multiple metrics share the same cleaning logic. Future work aims to parse and merge identical SQL fragments during metric upload, compute shared parts once, and combine results, thereby improving performance.

data pipelineFlinkKafkaTrinoHudireal-time metrics
Liulishuo Tech Team
Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.