Cloud Native 8 min read

How Apache Pulsar’s Multi‑Replica I/O Modes Enable Layered Storage

This article explains how Apache Pulsar’s layered architecture separates I/O modes, uses BookKeeper for durable storage, and leverages multi‑replica write quorums and hierarchical caches to provide efficient tail‑read, catch‑up read, and long‑term storage options such as S3, Azure and GCS.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
How Apache Pulsar’s Multi‑Replica I/O Modes Enable Layered Storage

Overview

Apache Pulsar is a cloud‑native distributed messaging platform that separates compute from storage, supports multi‑tenant, persistent topics, cross‑region replication, strong consistency, high throughput and low latency.

Layered Architecture and I/O Isolation

Pulsar’s layered design isolates each I/O mode (write, tail‑read, catch‑up read) so that reads never block writes. Adding new storage layers does not affect client code.

Messaging Model

Clients act as producers, consumers, or both. Producers publish to a broker; consumers read from the broker. Topics are assigned to brokers. Pulsar guarantees total‑order atomic broadcast per topic: once a broker acknowledges a publish, the message is immutable, never lost or reordered, and all consumers see the same order.

BookKeeper as Backend Store

Topic backlogs are stored in Apache BookKeeper. The broker is a stateless service that writes each incoming message to a BookKeeper ledger. After the ledger node(s) acknowledge the write, the broker acknowledges the producer, making the message readable.

I/O Modes and Write Quorum

Write : publish messages.

Tail read : deliver newly written messages to active subscribers.

Catch‑up read : a new or long‑offline consumer reads a large backlog from the log suffix.

Write operations use a configurable write quorum of BookKeeper nodes. When the ack quorum is satisfied, the message is committed, assigned a fixed log offset, and becomes immutable.

Cache Hierarchy

1. Broker cache (level‑1) serves tail reads directly from memory, avoiding disk I/O.

2. Ledger storage on BookKeeper nodes (level‑2) receives writes from an in‑memory buffer that is periodically flushed to disk. This storage supports catch‑up reads while keeping read and write paths isolated.

Layered (Long‑Term) Storage

When layered storage is enabled, older topic segments are off‑loaded to object storage (Amazon S3, S3‑compatible services, Azure Blob, or Google Cloud Storage – supported from Pulsar 2.2.0). A segment is a contiguous block of messages (default 50 000 messages). Only the active segment remains writable; closed segments become immutable and are copied to long‑term storage based on time‑ or size‑based policies defined in the namespace. After successful off‑load, the original segments are deleted from BookKeeper to reclaim space.

Configuration Example

# Example namespace policy for tiered storage
pulsar-admin namespaces set-retention \
  my-tenant/my-namespace \
  -t 7d -s 10GB

pulsar-admin namespaces set-tiered-storage \
  my-tenant/my-namespace \
  --offloadThresholdInBytes 50000000 \
  --offloadDeletionLagMs 86400000

Supported Object Storage Backends

Pulsar can use:

Amazon S3 and S3‑compatible services

Azure Blob Storage

Google Cloud Storage (available from version 2.2.0)

Apache Pulsar overview diagram
Apache Pulsar overview diagram
Pulsar I/O mode illustration
Pulsar I/O mode illustration
Cache hierarchy and layered storage diagram
Cache hierarchy and layered storage diagram
cloud-nativeMessagingApache PulsarBookKeeperLayered Storage
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.