Big Data 3 min read

Optimizing Real-Time Kafka Writes in Spark Streaming Using a Broadcasted KafkaProducer

To improve the performance of writing streaming data to Kafka, the article demonstrates how to replace per-partition KafkaProducer creation with a lazily-initialized, broadcasted producer in Scala, reducing overhead and achieving dozens‑fold speed gains.

Big Data Technology Architecture

May 17, 2019

Optimizing Real-Time Kafka Writes in Spark Streaming Using a Broadcasted KafkaProducer

In many Spark Streaming projects, developers write each RDD partition's data to Kafka by creating a new KafkaProducer inside the partition loop, which leads to severe performance degradation.

The article first shows the naive implementation where kafkaStreams.foreachRDD creates a producer for every partition, setting properties such as group.id, acks, retries, bootstrap servers, and serializers.

To avoid this overhead, a lazily‑initialized wrapper class broadcastKafkaProducer is introduced; it defines a lazy val producer = createproducer() and provides send methods that return a Future[RecordMetadata].

Next, the wrapper is instantiated and broadcast to all executors using Spark's Broadcast mechanism, with the same producer configuration as before.

Finally, the streaming job uses the broadcasted producer inside foreachRDD and foreachPartition to send records, eliminating per‑partition producer creation and achieving dozens‑fold speed improvements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization Kafka Spark Streaming Scala Broadcast Variable

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.