Operations 13 min read

How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs

This article describes how Huya built a unified metadata platform to break data silos across its SRE systems, enabling standardized data ingestion, correlation, and analysis that improve resource governance, root‑cause diagnosis, and overall cost‑efficiency for large‑scale live streaming services.

Huya Tech Engineering
Huya Tech Engineering
Huya Tech Engineering
How a Unified Metadata Platform Boosts SRE Efficiency and Cuts Costs

Kuang Lingxuan, head of the SRE observability platform at Huya Live, leads the design and implementation of a unified metadata platform that integrates resource delivery, containerization, build and release, monitoring, and alerting systems.

Project Background

Pain Points

Separate SRE systems created severe data silos with no unified metadata model, hindering data understanding and usage.

Business cost control became difficult due to lack of insight into resource and cross‑region traffic usage.

Root‑cause analysis was hampered by missing correlations among monitoring metrics, traces, and alerts.

Key Insight

Horizontal linkage: connect application‑to‑application call relationships.

Vertical linkage: connect applications to the resources they consume.

Combined, they form a comprehensive metadata association network.

Metadata Types

Application services: service name, IP/Port, API, dependencies, frameworks, code repo.

Monitoring metrics: CPU, memory, network utilization, request volume, latency, error rates.

Infrastructure: containers, data centers, domains, network types, resource usage.

Middleware: databases, caches, message queues, real‑time and batch compute.

Solution Practice

Design Thinking

Use application services as the core of metadata association and build a unified metadata network.

Metadata Network Overview

a) Trace analysis generates client‑to‑service call chains, e.g. Huya App → GiftServer → AuthServer / MoneyServer .

b) Deployment data links services to resources, e.g. GiftServer → container(192.168.1.1) → physical machine → Guangzhou data center.

c) Monitoring metrics are correlated across business, application, and infrastructure layers.

d) Service‑to‑middleware links, e.g. MoneyServer → Mysql/Redis/Kafka and their host machines.

Design Summary

Define metadata ingestion standards and association models.

Connect applications, resources, and middleware across the entire network.

Provide visualization, search, and analysis capabilities.

Metadata Architecture

a) Output: web UI for visualizing metadata and an open platform for data access.

b) Coverage: the Meta Hub platform ingests metadata from all SRE systems.

c) Core modules include data conversion, association storage in a graph DB, SDK/OpenAPI/Gremlin for queries, and resource replay for usage statistics.

Graph DB stores the vertex/edge model of the metadata network; OLAP DB keeps multi‑dimensional snapshots for large‑scale analysis.

Application Scenarios

Multi‑Dimensional Resource Analysis

Shows historical resource usage and utilization trends for each application service, enabling rationality checks and governance.

Cross‑Data‑Center Traffic Governance

Detects and visualizes cross‑region calls, pinpointing which services, instances, and interfaces cause inter‑data‑center traffic.

Multi‑Tag Classification

Implements hierarchical tags stored in a graph model, generated from trace links and AIOps‑derived application portraits, enabling flexible queries.

Full‑Link Root‑Cause定位

Combines business, application, and infrastructure metrics with resource relationships to locate root causes, e.g., diagnosing low gift‑sending success rates.

Future Outlook

Extending the platform to cover the entire DevOps lifecycle—from code repository to build, release, and runtime—so that metadata can assist in security patch tracking (e.g., Log4j) and change‑impact analysis.

observabilitymetadataDevOpsSREresource governance
Huya Tech Engineering
Written by

Huya Tech Engineering

Official Huya Tech account. Technical insights, engineering practice, and frontier innovation all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.