Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms

This article details how Lakala built a distributed, cloud‑native messaging platform using Apache Pulsar, covering functional requirements, architectural advantages, performance testing, and real‑world integration scenarios such as OGG adapters, TiDB pipelines, OpenMessaging, custom sources, functions, Flink connectors, and future plans.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Why Apache Pulsar Beats Kafka and RocketMQ for Scalable Messaging Platforms

Function Requirements

Lakala has many project teams, each using its own messaging system, leading to tight coupling between business logic and specific queues, high maintenance cost, inconsistent operational expertise, and low resource utilization. The goal is a distributed foundational messaging platform with high reliability, low coupling, tenant isolation, easy horizontal scaling, operational simplicity, unified management, on‑demand provisioning, and support for both traditional queues and streaming queues.

Why Choose Apache Pulsar

Among major open‑source platforms, Kafka and RocketMQ use a combined compute‑and‑storage architecture, while Pulsar separates compute and storage. A comparative test (see Table 2) shows Pulsar better meets Lakala’s needs.

Pulsar Architecture Advantages

Pulsar is a cloud‑native distributed messaging platform originated at Yahoo!, serving over 1.4 million topics and processing more than 100 billion messages daily. It became an Apache top‑level project in 2018.

Key features include multi‑tenant support, high throughput, low latency, native multi‑cluster replication, scalability to millions of topics, multi‑language clients (Java, Go, Python, C++), and various subscription modes (exclusive, shared, failover, key_shared).

Broker Architecture

The broker consists of four modules that can be customized as needed:

Dispatcher: handles protocol conversion, serialization, and deserialization.

Load balancer: controls traffic flow.

Global replicator: provides asynchronous cross‑cluster replication.

Service discovery: selects stateless leaders for each topic.

Persistence Layer (BookKeeper) Architecture

BookKeeper provides independent storage nodes (bookies) and uses ZooKeeper for metadata, service discovery, and metadata management. It follows a slave‑slave architecture where all bookies are identical slaves and the client acts as the leader, enabling fast failover.

Isolation Architecture

Pulsar achieves superior performance through:

IO isolation: separates write, tail‑read, and catch‑up read.

Sequential disk writes to maximize bandwidth.

Parallel disk usage for high‑throughput reads.

Multi‑level caching to reduce latency for both tail‑read and catch‑up read.

Comparison Summary

Traditional broker designs (Kafka, RabbitMQ) combine compute and storage, which can cause cache pollution when topic counts grow. Pulsar’s split architecture (broker + BookKeeper) reduces coupling, simplifies scaling and failover, and offers clearer partitioning characteristics (see Table 3).

Pulsar in the Basic Messaging Platform Practice

The platform architecture (Figure 7) shows green components built on Pulsar. Several real‑world scenarios illustrate its usage.

Scenario 1: Streaming Queue

1. OGG For Pulsar Adapter

Oracle GoldenGate captures change data (CDC) from Oracle tables. Since OGG lacks a Pulsar sink, a custom OGG‑For‑Pulsar component was developed. It reads each DML operation, uses the primary key as the message key, and ensures ordered delivery by partitioning on the key.

2. Pulsar To TiDB Component

This component consumes Pulsar messages via a disaster‑recovery subscription, hashes the key to assign it to a persistence thread, enables Pulsar’s deduplication, and writes the data into TiDB. Schema information is stored in a dedicated schema topic to keep data payloads small.

3. Message Persistence Process

OGG‑For‑Pulsar calls the Pulsar producer API.

The client queries a broker for the topic’s leader.

The client sends the message to the selected broker.

The broker persists the message through BookKeeper, respecting replication settings.

4. Dynamic Table‑Structure Transmission

When using AVRO, a wrapper schema (table_name, schema_fingerprint, payload) remains constant while the table schema changes. The wrapper schema is sent to a schema topic; data topics contain only the fingerprint, reducing message size. Consumers cache schemas by fingerprint for deserialization.

5. Consistency Guarantees

To ensure ordering and deduplication, settings are applied at the namespace (enable deduplication), broker, producer (disable batching, block on full queue), and consumer levels (interceptors, ack timeout, cumulative acknowledgments).

6. Message Acknowledgment Modes

Queue consumption prefers individual acknowledgments; streaming consumption prefers cumulative acknowledgments to maintain order.

7. Client‑Side Ack‑Timeout Detection

The mechanism uses a double‑ended queue and multiple hash sets, processing one set per poll interval to avoid global lock contention.

Scenario 2: Message Queue – OpenMessaging Protocol

Pulsar implements the OpenMessaging protocol to decouple applications from underlying message systems. Spring‑Boot based frameworks use this protocol for sending and receiving messages.

Scenario 3: Streaming Queue – Custom Kafka 0.8 Source

Since Pulsar IO lacks a Kafka 0.8 source, a custom component was built following Pulsar IO interfaces to ingest data from legacy Kafka clusters.

Scenario 4: Streaming Queue – Function‑Based Message Filtering

Pulsar Functions mask sensitive fields (e.g., ID numbers, phone numbers) before forwarding data to cloud clusters.

Scenario 5: Streaming Queue – Pulsar Flink Connector

Flink jobs consume Pulsar streams for real‑time merchant analytics and write results back to TiDB via Pulsar, demonstrating stable performance.

Scenario 6: Streaming Queue – TiDB CDC Adapter

A Go‑based TiDB CDC‑For‑Pulsar component serializes TiDB change events (non‑AVRO) and publishes them to Pulsar.

Future Planning

The platform will gradually retire other messaging systems, fully migrate to Pulsar, and deepen usage of Pulsar’s resource isolation and flow‑control mechanisms. Throughout the practice, Pulsar’s native features and custom components have successfully met all functional requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architecturestream processingPerformance TestingApache PulsarDistributed Messaging
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.