Operations 17 min read

How Tencent Music Cut Kafka Costs by 50% with Cloud‑Native AutoMQ

Tencent Music replaced its traditional Kafka clusters with the cloud‑native AutoMQ platform, slashing infrastructure costs by over half, achieving second‑level partition migration, and dramatically simplifying operations while maintaining high‑throughput, low‑latency data streams for its massive music services.

High Availability Architecture
High Availability Architecture
High Availability Architecture
How Tencent Music Cut Kafka Costs by 50% with Cloud‑Native AutoMQ

Background

Tencent Music Entertainment Group runs several nationwide music apps that generate massive user‑behavior and business data every day. A robust, stable, and efficient Kafka streaming system underpins precise recommendation, user growth, and monetization. Rapid business growth exposed the limits of self‑managed Kafka clusters in operational complexity and cost.

Technical Architecture

The team adopted the cloud‑native AutoMQ solution. Data flows from sources through a unified "Data Channel" platform, into AutoMQ clusters (Cluster A, B, C), then to real‑time computation (Flink) and finally to storage (OLAP databases, Elasticsearch) before serving observability and analytics applications.

Architecture diagram
Architecture diagram

Kafka Challenges

High resource reservation cost (30‑40% idle servers) due to compute‑storage coupling.

Expensive storage because each broker needs multiple high‑performance disks.

Additional overhead from multi‑replica synchronization.

Scaling operations require days of manual partition migration and risk.

Hotspot handling needs manual producer reconfiguration.

Kafka challenges
Kafka challenges

Why Choose AutoMQ

Stateless brokers with compute‑storage separation enable second‑level partition migration and automated scaling, reducing expansion time from a day to minutes.

Separate scaling of compute and object‑storage cuts both compute reservation and storage costs, lowering total cost of ownership by >50%.

Full Kafka‑protocol compatibility ensures zero code changes for producers and consumers.

Native Kubernetes integration allows AutoMQ to be scheduled like any other pod, unlocking cloud‑native benefits.

Built‑in Table Topic feature writes streams directly to Apache Iceberg tables, simplifying data‑lake ingestion.

Evaluation and Migration Process

The migration was split into two phases: load verification and production migration.

Load Verification

Two scenarios were tested:

June 2025 – high‑throughput, low‑QPS workload.

July 2025 – high‑QPS, low‑payload workload.

AutoMQ met or exceeded all performance targets, giving confidence for production rollout.

Production Migration

Switch producers to AutoMQ endpoints via rolling updates.

Drain old Kafka clusters by letting consumers finish processing existing data.

Switch consumers to AutoMQ after the old data is fully consumed.

Launch Results and Benefits

Six AutoMQ clusters now serve production traffic with peak write throughput of 1.6 GiB/s and peak QPS of 480 K.

Production cluster monitoring
Production cluster monitoring

Overall Kafka cost reduced by more than 50%.

Second‑level partition migration and self‑balancing traffic redistribution enable scaling of +1 GiB/s within seconds.

Cost reduction chart
Cost reduction chart

Future Outlook

Complete migration of all remaining Kafka clusters to AutoMQ.

Deploy Table Topic to stream data directly into Iceberg tables.

Standardize AutoMQ as an internal infrastructure component and promote it across business lines.

Fully relocate Kafka services to Kubernetes to achieve end‑to‑end cloud‑native operation.

cloud nativeoperationsAutoMQData Streaming
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.