Mastering Pulsar Schema: When and How to Use Schemas for Reliable Messaging

This article explains why Apache Pulsar schemas are essential for serializing POJO data, compares producer usage with and without schemas, details primitive and complex schema types, key/value handling, auto‑schema options, versioning, evolution, compatibility strategies, and provides concrete Java code examples for each scenario.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
Mastering Pulsar Schema: When and How to Use Schemas for Reliable Messaging

Why Use Pulsar Schema?

When a producer needs to send POJO objects, Pulsar requires a serialization mechanism to convert the object into bytes. Using a schema lets the client handle this conversion automatically, avoiding manual byte‑array handling.

Producer Without a Schema

If a producer is created without specifying a schema, it can only send byte[] messages. The user must serialize the POJO to a byte array before calling producer.send().

Producer<byte[]> producer = client.newProducer()
    .topic(topic)
    .create();
User user = new User("Bill", 40);
byte[] message = /* serialize user */;
producer.send(message);

Producer With a Schema

When a schema is provided, the producer can send the POJO directly; Pulsar handles serialization internally.

Producer<User> producer = client.newProducer(JSONSchema.of(User.class))
    .topic(topic)
    .create();
User user = new User("Bill", 40);
producer.send(user);

Both producer and consumer must use compatible schemas; introducing a schema simplifies this coordination.

Pulsar Schema Types

Schemas are divided into Primitive and Complex types.

Primitive: BOOLEAN, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, BYTES, STRING, TIMESTAMP, INSTANCE, LOCAL_DATE, LOCAL_TIME, LOCAL_DATE_TIME

Complex: key/value, struct (AVRO/JSON/Protobuf)

Key/Value Schema

Key/value schemas store both key and value SchemaInfo. Two encoding modes are supported:

INLINE : key and value are stored together in the message payload.

SEPARATED : the key is stored as the message key, the value as the payload.

Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
    Schema.INT32,
    Schema.STRING,
    KeyValueEncodingType.INLINE);

Struct Schema Usage

Pulsar provides three ways to work with struct schemas:

Static : Use a known POJO class.

public class User {
    String name;
    int age;
}
Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
producer.newMessage().value(User.builder().name("Pulsar-user").age(1).build()).send();
Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
User user = consumer.receive();

Generic : Define a schema at runtime when the POJO type is unknown.

RecordSchemaBuilder builder = SchemaBuilder.record("schemaName");
builder.field("intField").type(SchemaType.INT32);
SchemaInfo schemaInfo = builder.build(SchemaType.AVRO);
Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
producer.newMessage().value(schema.newRecordBuilder().set("intField", 32).build()).send();

SchemaDefinition : Generate a struct schema from a POJO definition.

SchemaDefinition<User> def = SchemaDefinition.builder()
    .withPojo(User.class)
    .build();
Producer<User> producer = client.newProducer(def).create();
producer.newMessage().value(User.builder().name("Pulsar-user").age(1L).build()).send();
Consumer<User> consumer = client.newConsumer(def).subscribe();
User user = consumer.receive();

SchemaInfo Structure

SchemaInfo

contains the fields name, type, schema (byte array), and properties (user‑defined map). Example JSON representation:

{
  "name": "test-string-schema",
  "type": "STRING",
  "schema": "",
  "properties": {}
}

Schema Versioning

Each schema registered under a topic carries a long version. When a producer sends a message, the version is attached, allowing consumers to retrieve the correct SchemaInfo for deserialization.

Schema Evolution and Compatibility

When business requirements change, schemas may evolve. Pulsar defines eight compatibility strategies to ensure downstream consumers can handle both old and new data:

ALWAYS_COMPATIBLE – all changes allowed.

ALWAYS_INCOMPATIBLE – no changes allowed.

BACKWARD – new consumers can read data written with older schemas (add optional fields, delete fields).

BACKWARD_TRANSITIVE – new consumers can read data from any previous version.

FORWARD – older consumers can read data written with newer schemas (add fields, delete optional fields).

FORWARD_TRANSITIVE – older consumers can read data from any newer version.

FULL (default) – both backward and forward compatibility for the latest version.

FULL_TRANSITIVE – full compatibility across all versions.

Auto Schema

If the schema type of a topic is unknown, Pulsar offers two auto‑schema modes:

AUTO_PRODUCE : validates that the bytes a producer wants to send are compatible with the topic’s schema.

AUTO_CONSUME : validates that bytes received from a topic can be deserialized into a generic record (supports AVRO, JSON, Protobuf).

Example of AUTO_PRODUCE:

Producer<byte[]> prod = client.newProducer(Schema.AUTO_PRODUCE())
    .topic(topic)
    .create();
byte[] kafkaMsg = ...;
prod.send(kafkaMsg);

Example of AUTO_CONSUME:

Consumer<GenericRecord> cons = client.newConsumer(Schema.AUTO_CONSUME())
    .topic(topic)
    .subscribe();
Message<GenericRecord> msg = cons.receive();
GenericRecord record = msg.getValue();

Auto Schema Update

When a schema passes compatibility checks, the producer automatically synchronizes its schema version with the topic. The update flow differs for producers and consumers and is illustrated by the following diagrams:

These diagrams show how producers and consumers negotiate schema versions and update their local caches when compatibility is confirmed.

Conclusion

Using Pulsar schemas simplifies POJO serialization, ensures data compatibility across producer and consumer versions, and provides flexible strategies for schema evolution. Auto‑schema modes further reduce the operational burden when the exact schema type is unknown.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaBackend DevelopmentSchemaApache PulsarMessage Serialization
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.