Big Data 8 min read

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

This article details NetEase Cloud Music's evolution of a real‑time data warehouse built on Flink 1.9 and Calcite, covering platform scale, architectural design, metadata management, SDK simplifications, monitoring improvements, and concrete use cases such as AB‑testing, live reporting, and feature serving.

ITPUB

Oct 16, 2020

How NetEase Cloud Music Built a Real‑Time Data Warehouse with Flink & Calcite

Background

By 2020, NetEase Cloud Music's real‑time computing platform operated on more than 150 machines, ran over 700 tasks, and handled peak QPS of 4 million, with roughly 180 developers using the system. The platform, launched in early 2018, underwent two major version upgrades, expanding task count by nearly 200% by mid‑2020.

Limitations of the First Version (Flink 1.7)

The initial version was built on Flink 1.7, which lacked native SQL DDL support. To compensate, a custom Antlr‑based SQL grammar was created, adding DDL and dimension‑table JOIN capabilities. However, the platform missed critical features such as data‑lineage tracking, metadata governance, and comprehensive task monitoring, making troubleshooting difficult.

Real‑Time Data Warehouse Construction (Flink 1.9)

The next generation, based on Flink 1.9, introduced several key enhancements:

Integration with a centralized metadata hub, allowing users to avoid manually defining data formats.

Provision of both SQL and a Java/Scala SDK for developers.

End‑to‑end data‑lineage collection.

Rich source‑ and task‑level monitoring, including MQ data‑volume metrics.

Architecture Overview

Data enters the system via SQL statements or SDK calls, which are parsed by the Planner. The Planner interacts with a Catalog that injects metadata from the MetaHub (the metadata center). The MetaHub manages all metadata, offering plug‑in modules for MQ metadata, unified data types, and searchable metadata.

Data Warehouse Layers

The warehouse is divided into three parts: a unified table naming convention ( catalog.db.table), layered storage (offline → real‑time), and table‑level permission management. The real‑time warehouse mirrors the offline model, replicating tables to provide low‑latency access.

SDK Simplification

The SDK encapsulates internal SQL execution, exposing a concise API and automatic lineage capture. A real‑world demo reduced the implementation from over 190 lines of code to just a dozen lines, dramatically improving developer productivity.

Monitoring Enhancements

Fine‑grained metrics are collected at the task level, and MQ data volumes are tracked. According to the contributor, robust cluster‑level monitoring becomes indispensable once the platform reaches a certain scale.

Practical Use Cases

AB‑Testing

Raw data is first stored in Hive, cleaned and aggregated with Spark, then written to real‑time tables. The new real‑time AB‑Test pipeline eliminates the previous Hive + Spark batch process, delivering faster feedback and better resource utilization.

Real‑Time Reporting

Live dashboards, such as the real‑time playback count for NetEase Cloud Music live streams, are built on the warehouse. The streamlined task creation and clearer data‑issue tracing simplify operations.

Real‑Time Feature Serving

Feature reuse and lineage are supported. By analyzing feature generation across algorithm teams, the platform identified significant duplication, leading to resource waste. Features are now layered, isolated by business domain, and searchable, enabling teams to discover and reuse existing features efficiently.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Streaming Calcite Real-Time Data Warehouse metadata management

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Limitations of the First Version (Flink 1.7)

Real‑Time Data Warehouse Construction (Flink 1.9)

Architecture Overview

Data Warehouse Layers

SDK Simplification

Monitoring Enhancements

Practical Use Cases

AB‑Testing

Real‑Time Reporting

Real‑Time Feature Serving

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

Limitations of the First Version (Flink 1.7)

Real‑Time Data Warehouse Construction (Flink 1.9)