Big Data 15 min read

Overview of TubeMQ: Principles, Architecture, Performance, and Open‑Source Strategy for Big‑Data Message Queues

TubeMQ is a trillion‑level, Java‑based distributed message‑queue middleware designed for massive‑data ingestion, offering 140 k TPS with sub‑5 ms latency, high reliability, low cost, and horizontal scalability, and is being open‑sourced to the Apache foundation to foster community collaboration and future expansion beyond traditional MQ functions.

Tencent Cloud Developer

Jan 6, 2020

Overview of TubeMQ: Principles, Architecture, Performance, and Open‑Source Strategy for Big‑Data Message Queues

Recently, the Cloud+ Community Tech Salon "Tencent Open‑Source Technology" concluded successfully. Several Tencent technical experts discussed open‑source projects such as TencentOS tiny, TubeMQ, Kona JDK, TARS, and MedicalNet. This article summarizes the talk by Zhang Guocheng.

Key points:

Principles and characteristics of Message Queues (MQ).

Implementation principles and usage of TubeMQ.

Future development and discussion of TubeMQ.

1. Message Queue Introduction

According to Wikipedia, a Message Queue (MQ) is a communication method between different processes or threads. MQ is adopted because it can integrate multiple systems, decouple components, and provide buffering for peak loads. Popular MQs such as Kafka, RocketMQ, and Pulsar share these traits.

In big‑data scenarios, MQ should have high throughput, low latency, high stability, low cost, simple protocols, and especially strong horizontal scalability. Massive data volumes (hundreds of billions to trillions) demand such scalability.

2. TubeMQ Implementation Principles and Usage

2.1 Features of TubeMQ

TubeMQ is a trillion‑level distributed messaging middleware focused on massive data transmission and storage. It offers advantages in performance, reliability, and cost.

In a benchmark, TubeMQ achieved 140,000 TPS with message latency under 5 ms, using 1,000 topics each with 10 partitions. Compared with Kafka, which can reach millions of TPS under different configurations, TubeMQ’s performance is measured under a realistic large‑scale scenario.

The system has been stable in production for seven years, using a thin‑client, server‑controlled model, suitable for real‑time ad recommendation, massive data reporting, metrics & monitoring, stream processing, and IoT data ingestion.

2.2 System Architecture

TubeMQ interacts with external clients via an SDK or a custom TCP protocol. It is written entirely in Java, features a Master HA coordination node, and uses a lightweight Zookeeper‑like service for offset management. Storage adopts RAID‑10 multi‑replica disks with fast consumption, and metadata is self‑managed.

2.3 Development History

Since 2013, TubeMQ has gone through four stages: introduction, improvement, self‑development, and innovation. Data volume grew from 20 billion in 2013 to an estimated 40 trillion in 2020.

When data reaches hundreds of billions to trillions, challenges such as system stability, performance, hardware cost, and O&M cost become critical. TubeMQ now runs on 1,500 machines with minimal operational staff.

2.4 Horizontal Comparison with Other MQs

Compared with Kafka, RocketMQ, and JD’s JMQ, TubeMQ offers comparable performance with far fewer machines (about 1/4‑1/5 of the count) and significant cost savings (each commercial server costs ~100,000 CNY, saving several hundred million CNY).

3. Storage Model and Control Measures

The core of any MQ is its storage model. TubeMQ uses a per‑Topic memory‑plus‑file scheme: data is first written to primary memory, then to backup memory, and finally flushed asynchronously to disk. Consumption offsets determine whether data is read from memory or disk, reducing system load and increasing storage capacity.

TubeMQ tolerates some data loss in extreme cases to achieve high performance and low cost, which is acceptable for scenarios like ad recommendation and IoT data reporting.

4. Why Open‑Source?

Reasons include internal collaboration, external technical influence, providing practical value to the community, and breaking barriers. By donating TubeMQ to the Apache Software Foundation, the project gains neutral governance, long‑term maintenance, and broader ecosystem integration.

5. Future Development

In 2020, daily ingestion is expected to exceed 40 trillion records. Hardware upgrades (e.g., from TS60 to BX2) will shift CPU and I/O bottlenecks, prompting further research on resource optimization.

The community will continue to grow, with contributions from both internal and external developers, and the project will eventually expand beyond MQ to include aggregation, collection, and management layers.

6. Q&A

Q: Why does TubeMQ outperform Kafka despite similar storage structures?

A: TubeMQ stores data per Topic (no Partition concept) and uses a primary‑backup memory model, reducing read/write contention and improving throughput.

Other MQs like RocketMQ store all data in a single file per Partition, leading to write bottlenecks under high traffic. TubeMQ’s design avoids these issues.

Speaker

Zhang Guocheng , Senior Engineer at Tencent, has led TubeMQ development since 2015, handling data growth from trillions to 35 trillion and gaining extensive experience in massive‑data ingestion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems Performance Big Data open source Message Queue TubeMQ

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.