Big Data 21 min read

Apache Pulsar Overview: Architecture, Components, and Usage

This article provides a comprehensive overview of Apache Pulsar, detailing its multi‑tenant architecture, core components such as brokers and BookKeeper, message handling features, producer and consumer configurations, comparison with Kafka, and operational considerations for deployment.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Apache Pulsar Overview: Architecture, Components, and Usage

Introduction

Apache Pulsar is a server‑to‑server messaging system originally developed by Yahoo and now an Apache top‑level project. It offers multi‑tenant, high‑performance messaging with native support for multiple clusters, low latency, and scalability to millions of topics.

Cluster Architecture

A Pulsar cluster consists of three main parts: brokers, a BookKeeper cluster (bookies), and a ZooKeeper ensemble. Brokers handle producer load‑balancing and forward messages to BookKeeper for durable storage, while ZooKeeper stores metadata.

Brokers

Brokers are stateless components that run an HTTP server for REST API and a TCP dispatcher for the binary protocol, enabling both synchronous and asynchronous message production.

Apache BookKeeper

BookKeeper provides a distributed write‑ahead log (WAL) storage layer. Each ledger is an append‑only structure written by a single writer and replicated across multiple bookies, ensuring durability and read consistency.

Ledgers and Managed Ledgers

Ledgers store messages; managed ledgers abstract a message stream with multiple cursors for consumption. Ledger read consistency is maintained even after failures via recovery processes.

Comparison with Kafka

Unlike Kafka’s monolithic architecture, Pulsar separates compute (brokers) from storage (bookies). Topics are sharded across BookKeeper, enabling seamless scaling and independent broker scaling.

Producer Features

Producers can send messages synchronously or asynchronously, support batching, chunking for large messages, and exactly‑once semantics via broker‑side deduplication.

MessageListener myMessageListener = (consumer, msg) -> {
  try {
    System.out.println("Message received: " + new String(msg.getData()));
    consumer.acknowledge(msg);
  } catch (Exception e) {
    consumer.negativeAcknowledge(msg);
  }
};
Consumer consumer = client.newConsumer()
    .topic("my-topic")
    .subscriptionName("my-subscription")
    .messageListener(myMessageListener)
    .subscribe();

Consumer Features

Consumers can receive messages synchronously or asynchronously, use listeners, acknowledge or negatively acknowledge messages, and support dead‑letter topics, message retention, and delayed delivery.

int retentionTime = 10; // 10 minutes
int retentionSize = 500; // 500 megabytes
RetentionPolicies policies = new RetentionPolicies(retentionTime, retentionSize);
admin.namespaces().setRetention(namespace, policies);

Tenant and Namespace Management

Pulsar is multi‑tenant; tenants contain namespaces that group topics. Policies such as retention, quota, and TTL can be set at namespace level, and change events are logged for audit.

$ bin/pulsar-admin tenants create my-tenant \
  --admin-roles my-admin-role \
  --allowed-clusters us-west,us-east

Operational Configuration

Key broker parameters include delayedDeliveryEnabled, retention settings, and batch acknowledgment controls.

# Whether to enable the delayed delivery for messages.
# If disabled, messages are immediately delivered and there is no tracking overhead.
delayedDeliveryEnabled=true

# Control the ticking time for the retry of delayed message delivery,
# affecting the accuracy of the delivery time compared to the scheduled time.
# Default is 1 second.
delayedDeliveryTickTimeMillis=1000

Conclusion

By separating storage and compute, Pulsar offers superior scalability and flexibility compared to traditional messaging systems, making it a compelling choice for modern data pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

multi-tenantApache PulsarDistributed MessagingBookKeeperConsumersProducers
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.