Mastering Pulsar Schema: When and How to Use Schemas for Reliable Messaging
This article explains why Apache Pulsar schemas are essential for serializing POJO data, compares producer usage with and without schemas, details primitive and complex schema types, key/value handling, auto‑schema options, versioning, evolution, compatibility strategies, and provides concrete Java code examples for each scenario.
Why Use Pulsar Schema?
When a producer needs to send POJO objects, Pulsar requires a serialization mechanism to convert the object into bytes. Using a schema lets the client handle this conversion automatically, avoiding manual byte‑array handling.
Producer Without a Schema
If a producer is created without specifying a schema, it can only send byte[] messages. The user must serialize the POJO to a byte array before calling producer.send().
Producer<byte[]> producer = client.newProducer()
.topic(topic)
.create();
User user = new User("Bill", 40);
byte[] message = /* serialize user */;
producer.send(message);Producer With a Schema
When a schema is provided, the producer can send the POJO directly; Pulsar handles serialization internally.
Producer<User> producer = client.newProducer(JSONSchema.of(User.class))
.topic(topic)
.create();
User user = new User("Bill", 40);
producer.send(user);Both producer and consumer must use compatible schemas; introducing a schema simplifies this coordination.
Pulsar Schema Types
Schemas are divided into Primitive and Complex types.
Primitive: BOOLEAN, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, BYTES, STRING, TIMESTAMP, INSTANCE, LOCAL_DATE, LOCAL_TIME, LOCAL_DATE_TIME
Complex: key/value, struct (AVRO/JSON/Protobuf)
Key/Value Schema
Key/value schemas store both key and value SchemaInfo. Two encoding modes are supported:
INLINE : key and value are stored together in the message payload.
SEPARATED : the key is stored as the message key, the value as the payload.
Schema<KeyValue<Integer, String>> kvSchema = Schema.KeyValue(
Schema.INT32,
Schema.STRING,
KeyValueEncodingType.INLINE);Struct Schema Usage
Pulsar provides three ways to work with struct schemas:
Static : Use a known POJO class.
public class User {
String name;
int age;
}
Producer<User> producer = client.newProducer(Schema.AVRO(User.class)).create();
producer.newMessage().value(User.builder().name("Pulsar-user").age(1).build()).send();
Consumer<User> consumer = client.newConsumer(Schema.AVRO(User.class)).subscribe();
User user = consumer.receive();Generic : Define a schema at runtime when the POJO type is unknown.
RecordSchemaBuilder builder = SchemaBuilder.record("schemaName");
builder.field("intField").type(SchemaType.INT32);
SchemaInfo schemaInfo = builder.build(SchemaType.AVRO);
Producer<GenericRecord> producer = client.newProducer(Schema.generic(schemaInfo)).create();
producer.newMessage().value(schema.newRecordBuilder().set("intField", 32).build()).send();SchemaDefinition : Generate a struct schema from a POJO definition.
SchemaDefinition<User> def = SchemaDefinition.builder()
.withPojo(User.class)
.build();
Producer<User> producer = client.newProducer(def).create();
producer.newMessage().value(User.builder().name("Pulsar-user").age(1L).build()).send();
Consumer<User> consumer = client.newConsumer(def).subscribe();
User user = consumer.receive();SchemaInfo Structure
SchemaInfocontains the fields name, type, schema (byte array), and properties (user‑defined map). Example JSON representation:
{
"name": "test-string-schema",
"type": "STRING",
"schema": "",
"properties": {}
}Schema Versioning
Each schema registered under a topic carries a long version. When a producer sends a message, the version is attached, allowing consumers to retrieve the correct SchemaInfo for deserialization.
Schema Evolution and Compatibility
When business requirements change, schemas may evolve. Pulsar defines eight compatibility strategies to ensure downstream consumers can handle both old and new data:
ALWAYS_COMPATIBLE – all changes allowed.
ALWAYS_INCOMPATIBLE – no changes allowed.
BACKWARD – new consumers can read data written with older schemas (add optional fields, delete fields).
BACKWARD_TRANSITIVE – new consumers can read data from any previous version.
FORWARD – older consumers can read data written with newer schemas (add fields, delete optional fields).
FORWARD_TRANSITIVE – older consumers can read data from any newer version.
FULL (default) – both backward and forward compatibility for the latest version.
FULL_TRANSITIVE – full compatibility across all versions.
Auto Schema
If the schema type of a topic is unknown, Pulsar offers two auto‑schema modes:
AUTO_PRODUCE : validates that the bytes a producer wants to send are compatible with the topic’s schema.
AUTO_CONSUME : validates that bytes received from a topic can be deserialized into a generic record (supports AVRO, JSON, Protobuf).
Example of AUTO_PRODUCE:
Producer<byte[]> prod = client.newProducer(Schema.AUTO_PRODUCE())
.topic(topic)
.create();
byte[] kafkaMsg = ...;
prod.send(kafkaMsg);Example of AUTO_CONSUME:
Consumer<GenericRecord> cons = client.newConsumer(Schema.AUTO_CONSUME())
.topic(topic)
.subscribe();
Message<GenericRecord> msg = cons.receive();
GenericRecord record = msg.getValue();Auto Schema Update
When a schema passes compatibility checks, the producer automatically synchronizes its schema version with the topic. The update flow differs for producers and consumers and is illustrated by the following diagrams:
These diagrams show how producers and consumers negotiate schema versions and update their local caches when compatibility is confirmed.
Conclusion
Using Pulsar schemas simplifies POJO serialization, ensures data compatibility across producer and consumer versions, and provides flexible strategies for schema evolution. Auto‑schema modes further reduce the operational burden when the exact schema type is unknown.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
