Apache Pulsar Overview: Architecture, Components, and Usage
This article provides a comprehensive overview of Apache Pulsar, detailing its multi‑tenant architecture, core components such as brokers and BookKeeper, message handling features, producer and consumer configurations, comparison with Kafka, and operational considerations for deployment.
Introduction
Apache Pulsar is a server‑to‑server messaging system originally developed by Yahoo and now an Apache top‑level project. It offers multi‑tenant, high‑performance messaging with native support for multiple clusters, low latency, and scalability to millions of topics.
Cluster Architecture
A Pulsar cluster consists of three main parts: brokers, a BookKeeper cluster (bookies), and a ZooKeeper ensemble. Brokers handle producer load‑balancing and forward messages to BookKeeper for durable storage, while ZooKeeper stores metadata.
Brokers
Brokers are stateless components that run an HTTP server for REST API and a TCP dispatcher for the binary protocol, enabling both synchronous and asynchronous message production.
Apache BookKeeper
BookKeeper provides a distributed write‑ahead log (WAL) storage layer. Each ledger is an append‑only structure written by a single writer and replicated across multiple bookies, ensuring durability and read consistency.
Ledgers and Managed Ledgers
Ledgers store messages; managed ledgers abstract a message stream with multiple cursors for consumption. Ledger read consistency is maintained even after failures via recovery processes.
Comparison with Kafka
Unlike Kafka’s monolithic architecture, Pulsar separates compute (brokers) from storage (bookies). Topics are sharded across BookKeeper, enabling seamless scaling and independent broker scaling.
Producer Features
Producers can send messages synchronously or asynchronously, support batching, chunking for large messages, and exactly‑once semantics via broker‑side deduplication.
MessageListener myMessageListener = (consumer, msg) -> {
try {
System.out.println("Message received: " + new String(msg.getData()));
consumer.acknowledge(msg);
} catch (Exception e) {
consumer.negativeAcknowledge(msg);
}
};
Consumer consumer = client.newConsumer()
.topic("my-topic")
.subscriptionName("my-subscription")
.messageListener(myMessageListener)
.subscribe();Consumer Features
Consumers can receive messages synchronously or asynchronously, use listeners, acknowledge or negatively acknowledge messages, and support dead‑letter topics, message retention, and delayed delivery.
int retentionTime = 10; // 10 minutes
int retentionSize = 500; // 500 megabytes
RetentionPolicies policies = new RetentionPolicies(retentionTime, retentionSize);
admin.namespaces().setRetention(namespace, policies);Tenant and Namespace Management
Pulsar is multi‑tenant; tenants contain namespaces that group topics. Policies such as retention, quota, and TTL can be set at namespace level, and change events are logged for audit.
$ bin/pulsar-admin tenants create my-tenant \
--admin-roles my-admin-role \
--allowed-clusters us-west,us-eastOperational Configuration
Key broker parameters include delayedDeliveryEnabled, retention settings, and batch acknowledgment controls.
# Whether to enable the delayed delivery for messages.
# If disabled, messages are immediately delivered and there is no tracking overhead.
delayedDeliveryEnabled=true
# Control the ticking time for the retry of delayed message delivery,
# affecting the accuracy of the delivery time compared to the scheduled time.
# Default is 1 second.
delayedDeliveryTickTimeMillis=1000Conclusion
By separating storage and compute, Pulsar offers superior scalability and flexibility compared to traditional messaging systems, making it a compelling choice for modern data pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
