Big Data 16 min read

Kafka Producer: Overview, Detailed Process, Key Challenges, and Configuration

This article provides a comprehensive English guide to Kafka's producer component, covering its end‑to‑end workflow, detailed steps such as record creation, serialization, partitioning, batching, sender thread, custom data structures, send modes, common exceptions, and essential configuration parameters for optimal performance.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Kafka Producer: Overview, Detailed Process, Key Challenges, and Configuration

Kafka producers are responsible for packaging messages into ProducerRecord objects and delivering them to a specific topic in a Kafka cluster.

The high‑level flow includes: (1) creating a ProducerRecord , (2) serializing the record, (3) determining the target partition via a partitioner (or a user‑specified partition), (4) sending the record to the broker, and (5) receiving a RecordMetadata response or retrying on failure.

In detail, the producer first wraps each incoming message into a ProducerRecord . The record is then serialized (Kafka provides default serializers, but Avro, Thrift, or Protobuf are recommended for schema evolution). After serialization, the partitioner decides which partition the record belongs to, unless the partition is explicitly set. The record is placed into an internal buffer where multiple records are grouped into a batch (default batch size 16 KB). A dedicated Sender thread pulls batches from the buffer and transmits them to the appropriate broker, a design that moved Kafka from per‑message sends (pre‑0.8) to high‑throughput batch processing.

Kafka uses a custom CopyOnWriteMap to store batch information per partition, providing thread‑safe, read‑heavy, write‑light access suitable for the producer’s high‑concurrency environment. Additionally, a memory‑pool for 16 KB buffers reduces GC pressure.

Example Avro integration:

{
  "namespace": "customerManagement.avro",
  "type": "record",
  "name": "Customer",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "name", "type": "string"}
  ]
}

Java class for the value:

class Custom {
    private int customID;
    private String customerName;
    public Custom(int customID, String customerName) {
        this.customID = customID;
        this.customerName = customerName;
    }
    public int getCustomID() { return customID; }
    public String getCustomerName() { return customerName; }
}

Producer configuration example (Java):

Properties props = new Properties();
props.put("bootstrap.servers", "120.27.233.226:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("acks", "-1");
props.put("retries", 3);
props.put("batch.size", 16384);
props.put("linger.ms", 10);
props.put("buffer.memory", 33554432);
props.put("max.request.size", 1048576);
props.put("request.timeout.ms", 30000);
KafkaProducer
producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++) {
    String msg = "This is Message " + i;
    ProducerRecord
record = new ProducerRecord<>("test_topic", "test", msg);
    producer.send(record);
    System.out.println("Sent:" + msg);
    Thread.sleep(1000);
}
producer.close();

Kafka supports three send modes:

Fire‑and‑forget : send without waiting for acknowledgment; retries are handled automatically.

Synchronous send : call producer.send(record).get() to block until a RecordMetadata is returned or an exception is thrown.

Asynchronous send : provide a Callback implementation to handle success or failure after the broker responds.

Common producer exceptions include LeaderNotAvailableException , NotControllerException , NetworkException , and serialization or buffer‑exhaustion errors; most are retryable, while some (e.g., message too large) are not.

Key configuration knobs for performance and reliability are:

acks : "-1" (all replicas), "1" (leader only), or "0" (no ack).

retries : number of retry attempts.

batch.size : size of each batch (default 16 KB).

linger.ms : max wait time before sending an under‑filled batch.

buffer.memory : total memory allocated to the producer buffer (default 32 MB).

max.request.size : maximum size of a single request (default 1 MB).

request.timeout.ms : timeout for request‑response cycles.

JavaconfigurationSerializationKafkaproducerpartitioningAVRO
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.