Evolution of Financial‑Grade Message Queues at Ant Financial

The article reviews the ten‑year evolution of Ant Financial's message queue, detailing its core reliability, consistency, availability and performance requirements, the architectural mechanisms built to meet them, the shift to pull‑mode and API‑mode designs, and the recent integration of compute capabilities to create a smart data transmission platform.

AntTech
AntTech
AntTech
Evolution of Financial‑Grade Message Queues at Ant Financial

Ant Financial's message queue has been in production for over a decade, initially using ESB‑style mechanisms and later co‑building a new system with Taobao to address message loss issues.

The financial‑grade environment imposes four critical demands: extremely high reliability (no message loss), strong consistency (transactional correctness), continuous availability (service must be up during peak events), and ultra‑high performance (handling billions of messages and millions of TPS).

To satisfy these demands, the system implements ACK mechanisms inspired by TCP, retry logic, persistent storage guarantees, two‑phase transactional messaging, and extensive availability strategies ranging from thread‑pool isolation and rate limiting to multi‑data‑center active‑active deployments and the LDC (multi‑active) architecture that provides flexible zone‑aware routing.

With the rise of big‑data (OLAP) workloads, Ant Financial introduced a pull‑mode message queue based on log semantics, deploying it on physical machines to efficiently serve analytical scenarios.

Subsequently, a compute‑storage separation strategy was adopted: first a "挂盘" (mounted) mode using distributed file systems for durability, then an API mode that moved commit logs into queues, added global fixed partitions, idempotent sending, and strong ordering, improving performance and configurability.

Finally, computation capabilities were embedded into the queue using a lightweight, non‑centralized streaming framework that supports common operators and window semantics, turning the queue into a platform that not only transports and stores data but also processes it.

Overall, the message queue has evolved from a simple data conduit to a smart transmission‑computation service, continuously expanding its role in Ant Financial's distributed architecture.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig DataScalabilityStreamingMessage QueueReliability
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.