Introduction to Apache Pulsar: Architecture, Core Concepts, and Consumption Modes

This article provides an overview of Apache Pulsar, detailing its enterprise‑grade features, architectural components such as brokers, BookKeeper and ZooKeeper, core concepts like topics and metadata, and the supported consumption modes, illustrating why it is considered a next‑generation distributed messaging system.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Introduction to Apache Pulsar: Architecture, Core Concepts, and Consumption Modes

Pulsar Overview

Apache Pulsar is an enterprise‑grade distributed messaging system originally developed by Yahoo and open‑sourced in 2016 under the Apache Foundation. It has been used in Yahoo production for years, serving services such as Mail, Finance, Sports, Flickr, Gemini Ads, Sherpa, and Yahoo KV storage.

Key next‑generation features include linear scalability to hundreds of nodes, high throughput of millions of messages per second, low latency under 5 ms, a persistence layer built on Apache BookKeeper, geo‑replication across regions, flexible deployment (bare metal, Docker, Kubernetes, cloud), and multiple subscription modes (exclusive, shared, failover).

Architecture Overview

A Pulsar unit consists of multiple clusters that replicate data among each other. The main components are:

Broker: Handles producer messages, distributes them to consumers, uses a global ZooKeeper ensemble for coordination, stores messages in BookKeeper, and maintains metadata in a local ZooKeeper cluster.

BookKeeper Cluster: Contains many bookies that persist messages.

ZooKeeper Cluster: Stores configuration and coordination data.

Broker

Unlike Kafka or RocketMQ, Pulsar brokers are stateless nodes that expose REST APIs for admin commands, run an asynchronous TCP server using Protocol Buffers, and publish messages to other availability zones for geo‑replication. Messages are first written to BookKeeper and cached in broker memory for fast reads.

BookKeeper

BookKeeper is a horizontally scalable, fault‑tolerant, low‑latency storage service. Data is stored as records within ledgers, which are replicated across multiple bookies. In Pulsar, each partitioned topic consists of several ledgers; a ledger is append‑only and written by a single writer. When a ledger is closed, it becomes read‑only, and unused ledgers are eventually deleted.

BookKeeper ensures read consistency after failures because only one writer can modify a ledger at a time, eliminating write contention and enabling efficient recovery of brokers from ZooKeeper metadata.

Metadata

Metadata such as cross‑region configuration is stored in a global ZooKeeper ensemble, while per‑cluster ZooKeeper stores details like which ledgers a topic writes to and broker metrics.

Core Concepts

Topic

A topic is a logical channel where producers publish messages and consumers read them, potentially with multiple consumers per topic. Topics can be partitioned; each partition receives a unique, ordered offset. By default topics are non‑partitioned, but partitions can be created via CLI or API. Pulsar automatically balances producers and consumers, and supports routing rules like single‑partition, round‑robin, hash, or custom implementations.

Consumption Modes

Pulsar supports three subscription types:

Exclusive: Only one consumer can attach to the subscription (default).

Shared: Multiple consumers share the subscription; messages are dispatched round‑robin, and unacknowledged messages are redelivered if a consumer fails.

Failover: Consumers are ordered; the first (master) receives messages, and if it disconnects, the next consumer takes over. Pulsar also offers a Reader API that can start consumption from a specific message ID such as Message.Earliest.

Conclusion

Pulsar, as a next‑generation distributed message queue, offers attractive features such as geo‑replication, multi‑tenant support, linear scalability, and read‑write isolation, addressing many shortcomings of competing solutions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendApache PulsarDistributed Messaging
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.