Big Data 9 min read

Uber's Multi-Region Kafka Architecture and Disaster Recovery

This article explains how Uber built a multi‑region Kafka infrastructure with disaster‑recovery capabilities, detailing its replication topology, active/active and active/passive consumption modes, offset‑management service, and the challenges of ensuring reliable, low‑latency data streaming across regions.

Architecture Digest
Architecture Digest
Architecture Digest
Uber's Multi-Region Kafka Architecture and Disaster Recovery

Uber operates the world’s largest Kafka cluster, handling trillions of messages daily and serving as the backbone of its tech stack.

To achieve scalability, reliability, high performance, and ease of use, Uber designed a multi‑region Kafka architecture that provides data redundancy and supports regional failover.

The architecture consists of local regional clusters that producers write to, and a global aggregate cluster that replicates data across regions, as illustrated in Figure 2.

Message replication is handled by Uber’s uReplicator, an enhanced version of Kafka MirrorMaker that guarantees zero data loss and easy maintenance.

Consumption can follow two patterns. In the active/active mode, consumers in each region read from the aggregate cluster, allowing seamless failover when a region fails, as shown in Figure 3.

In the active/passive mode, a single consumer per logical group reads from one primary region, with offsets replicated to other regions; upon failure, the consumer switches to the standby region while preserving offset continuity, illustrated in Figures 4‑6.

Uber built an offset‑management service that tracks checkpoint mappings between regional and aggregate clusters, stores them in a dual‑active database, and synchronizes offsets to enable precise failover.

The conclusion emphasizes that reliable multi‑region Kafka is essential for Uber’s business continuity, and future work will focus on fine‑grained recovery without full regional failover.

Kafkadisaster recoverymulti-regionData StreamingOffset ManagementuReplicator
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.