Kafka Producer: Overview, Detailed Process, Key Challenges, and Configuration
This article provides a comprehensive English guide to Kafka's producer component, covering its end‑to‑end workflow, detailed steps such as record creation, serialization, partitioning, batching, sender thread, custom data structures, send modes, common exceptions, and essential configuration parameters for optimal performance.
Kafka producers are responsible for packaging messages into ProducerRecord objects and delivering them to a specific topic in a Kafka cluster.
The high‑level flow includes: (1) creating a ProducerRecord, (2) serializing the record, (3) determining the target partition via a partitioner (or a user‑specified partition), (4) sending the record to the broker, and (5) receiving a RecordMetadata response or retrying on failure.
In detail, the producer first wraps each incoming message into a ProducerRecord. The record is then serialized (Kafka provides default serializers, but Avro, Thrift, or Protobuf are recommended for schema evolution). After serialization, the partitioner decides which partition the record belongs to, unless the partition is explicitly set. The record is placed into an internal buffer where multiple records are grouped into a batch (default batch size 16 KB). A dedicated Sender thread pulls batches from the buffer and transmits them to the appropriate broker, a design that moved Kafka from per‑message sends (pre‑0.8) to high‑throughput batch processing.
Kafka uses a custom CopyOnWriteMap to store batch information per partition, providing thread‑safe, read‑heavy, write‑light access suitable for the producer’s high‑concurrency environment. Additionally, a memory‑pool for 16 KB buffers reduces GC pressure.
Example Avro integration:
{
"namespace": "customerManagement.avro",
"type": "record",
"name": "Customer",
"fields": [
{"name": "id", "type": "string"},
{"name": "name", "type": "string"}
]
}Java class for the value:
class Custom {
private int customID;
private String customerName;
public Custom(int customID, String customerName) {
this.customID = customID;
this.customerName = customerName;
}
public int getCustomID() { return customID; }
public String getCustomerName() { return customerName; }
}Producer configuration example (Java):
Properties props = new Properties();
props.put("bootstrap.servers", "120.27.233.226:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("acks", "-1");
props.put("retries", 3);
props.put("batch.size", 16384);
props.put("linger.ms", 10);
props.put("buffer.memory", 33554432);
props.put("max.request.size", 1048576);
props.put("request.timeout.ms", 30000);
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
for (int i = 0; i < 100; i++) {
String msg = "This is Message " + i;
ProducerRecord<String, String> record = new ProducerRecord<>("test_topic", "test", msg);
producer.send(record);
System.out.println("Sent:" + msg);
Thread.sleep(1000);
}
producer.close();Kafka supports three send modes:
Fire‑and‑forget : send without waiting for acknowledgment; retries are handled automatically.
Synchronous send : call producer.send(record).get() to block until a RecordMetadata is returned or an exception is thrown.
Asynchronous send : provide a Callback implementation to handle success or failure after the broker responds.
Common producer exceptions include LeaderNotAvailableException, NotControllerException, NetworkException, and serialization or buffer‑exhaustion errors; most are retryable, while some (e.g., message too large) are not.
Key configuration knobs for performance and reliability are:
acks : "-1" (all replicas), "1" (leader only), or "0" (no ack).
retries : number of retry attempts.
batch.size : size of each batch (default 16 KB).
linger.ms : max wait time before sending an under‑filled batch.
buffer.memory : total memory allocated to the producer buffer (default 32 MB).
max.request.size : maximum size of a single request (default 1 MB).
request.timeout.ms : timeout for request‑response cycles.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
