Backend Development 17 min read

How PhxQueue Achieves High‑Availability, High‑Throughput Distributed Queuing with Paxos

PhxQueue is a Tencent‑open‑source, Paxos‑based distributed queue that delivers at‑least‑once delivery, synchronous disk flushing, strict ordering, multi‑subscription, and high throughput, outperforming Kafka in reliability and failover scenarios while supporting massive workloads such as WeChat Pay.

21CTO

Sep 14, 2017

How PhxQueue Achieves High‑Availability, High‑Throughput Distributed Queuing with Paxos

PhxQueue is an open‑source, Paxos‑based distributed queue developed by Tencent, providing at‑least‑once delivery and widely used internally for WeChat Pay, public platform, and other critical services.

Message Queue Overview

Message queues, as a mature asynchronous communication pattern, offer several advantages over synchronous calls:

Decoupling: prevents excessive API integration from destabilizing systems and reduces pressure between callers and callees.

Peak‑shaving and flow control: producers never block, and bursts are buffered in the queue while consumers process at their own pace.

Reuse: one publish can serve multiple subscribers.

Background of PhxQueue

Old Queue

WeChat’s early distributed queue (the “old queue”) was a self‑developed component used across many business scenarios, providing decoupling, caching, and asynchronous capabilities. It employed Quorum NRW (N=3, W=R=2) with asynchronous disk flushing, balancing performance and availability.

New Requirements

As business grew, the old queue showed limitations:

Asynchronous flushing raised data reliability concerns, especially for payment‑related services that demand synchronous flushing.

Ordering issues: some workloads required strict order, which NRW could not guarantee.

Additional problems such as dequeue deduplication and load balancing also needed improvement.

Shortcomings of Existing Industry Solutions

Kafka, a popular queue in the big‑data domain, offers high throughput, automatic failover, and ordered enqueue/dequeue. However, for scenarios emphasizing data reliability, it has several drawbacks:

Performance vs. Synchronous Flush

Enabling log.flush.interval.messages=1 for synchronous flushing dramatically reduces throughput due to SSD write amplification and the mismatch between typical message size (~1 KB) and SSD page size (4 KB).

Producer Batch Inefficiency

Kafka’s batch model works well for log‑type data but not for business requests that require per‑request context, making batch gains limited.

Replica Synchronization Issues

Kafka’s ISR‑based replication prioritizes sync efficiency but suffers from availability problems: leader or follower failures can drop read/write success rates to zero until a new leader is elected, and sync latency depends on the slowest node.

PhxQueue Introduction

PhxQueue is now widely used within WeChat, handling billions of enqueues daily with peak rates of 100 million per minute. Its design goals are high data reliability, high availability, and high throughput, while supporting common queue features:

Synchronous flushing with built‑in real‑time reconciliation (no data loss).

Strict ordering of enqueue and dequeue.

Multi‑subscription.

Dequeue rate limiting.

Dequeue replay.

Parallel scalability of all modules.

Batch flushing and sync in the storage layer for high throughput.

Multi‑datacenter deployment support.

Automatic storage‑layer failover and load balancing.

Consumer automatic failover and load balancing.

PhxQueue Design

Overall Architecture

The system consists of five modules.

Store – Queue Storage

Store uses the PhxPaxos library to achieve replica synchronization via Paxos, providing linearizable reads/writes as long as a majority of nodes are operational.

Synchronous flushing is enabled by default, delivering performance comparable to asynchronous flushing. Multiple independent Paxos groups ensure master nodes are evenly distributed, balancing load and enabling automatic master failover.

Producer – Message Producer

Producers route messages based on a key; messages with the same key are sent to the same queue, guaranteeing order consistency between enqueue and dequeue.

Consumer – Message Consumer

Consumers pull messages in batches from Store, supporting multi‑goroutine processing. Users implement callbacks per topic and handler type to define processing logic.

Scheduler – Consumer Manager (optional)

Scheduler collects global consumer load, providing failover and load balancing. When deployed, Scheduler leaders maintain heartbeats with consumers and adjust queue‑consumer mappings dynamically. If the Scheduler leader fails, a distributed lock service elects a new leader without affecting consumer operation.

Lock – Distributed Lock Service (optional)

Lock offers a generic distributed lock API. It is required when Scheduler is deployed to elect its leader and to prevent multiple consumers from processing the same queue simultaneously. If the application tolerates duplicate consumption, Lock can be omitted.

Store Replication Process

Store replicates via PhxPaxos, which consists of three layers: the app layer handles business requests, the Paxos layer performs consensus, and the state‑machine layer applies the Paxos log to update state, ensuring strong consistency across nodes.

Store Group Commit – Efficient Flushing and Replication

Unoptimized Paxos suffers from write amplification due to per‑log synchronous flushes. PhxQueue mitigates this by deploying multiple Paxos groups and using a Group Commit mechanism: several queues within a group batch their enqueues, triggering a single Paxos sync and disk flush when a timeout or size threshold is reached.

PhxQueue vs. Kafka Comparison

Design Comparison

Both systems share a similar distributed queue architecture, but PhxQueue’s Paxos‑based replication and synchronous flushing give it stronger data reliability guarantees.

Performance Comparison

Test environment and benchmark configurations are shown below:

Results indicate that PhxQueue’s performance is on par with Kafka; under the same QPS, PhxQueue’s average latency is slightly better because it does not wait for the slowest node. With producer batching disabled, PhxQueue can achieve up to twice Kafka’s throughput in synchronous‑flush scenarios due to storage‑layer batching.

Storage‑Layer Failover Comparison

When a storage node fails, Kafka’s enqueue success rate drops to 0‑33% and recovery depends on a 10‑second lease, whereas PhxQueue’s success rate remains between 66%‑100% with a 5‑second lease. Enabling the queue‑retry feature raises PhxQueue’s success rate above 90% during failover.

Conclusion

PhxQueue implements master auto‑switching while preserving linearizability, achieves high‑throughput synchronous flushing, and provides a rich set of queue features suitable for diverse business scenarios. It is open‑source under the BSD license, written in C/C++, and already deployed at massive scale within WeChat.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kafka WeChat distributed queue Paxos high-availability

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.