Understanding and Using Broadcast Variables in Apache Flink
This article explains the concept, usage, precautions, and a practical example of broadcast variables in Apache Flink, illustrating how to initialize, broadcast, retrieve, and apply shared data across parallel operators with Java code snippets.
1. Broadcast Variable Introduction
In Flink, parallel operator instances run in separate slots and cannot directly access each other's data; broadcast variables provide a shared, read‑only dataset that is distributed to all tasks, existing as a single copy per node.
2. Usage
1: Initialize data
DataSet<Integer> num = env.fromElements(1, 2, 3)
2: Broadcast data
.withBroadcastSet(toBroadcast, "num");
3: Retrieve data
Collection<Integer> broadcastSet = getRuntimeContext().getBroadcastVariable("num");3. Precautions
Broadcast state does not allow inter‑task communication. Only the side that broadcasts can modify the state, and all parallel instances must apply identical modifications.
Event order may differ across instances. The order of elements in the broadcast stream is not guaranteed to be the same for every task, so state updates must not rely on input order.
All operator tasks are snapshotted. During checkpointing, each task checkpoints its broadcast state, increasing checkpoint size with higher parallelism.
Broadcast variables reside in memory. They should be kept reasonably small (hundreds of MB); multi‑gigabyte datasets are unsuitable.
4. Practical Example
public class BroadCastTest {
public static void main(String[] args) throws Exception {
ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
// 1. Create a DataSet to broadcast
DataSet<Integer> broadcast = env.fromElements(1, 2, 3);
DataSet<String> data = env.fromElements("a", "b");
data.map(new RichMapFunction<String, String>() {
private List<Integer> list = new ArrayList<>();
@Override
public void open(Configuration parameters) throws Exception {
// 3. Retrieve broadcast data as a Collection
Collection<Integer> broadcastSet = getRuntimeContext().getBroadcastVariable("number");
list.addAll(broadcastSet);
}
@Override
public String map(String value) throws Exception {
return value + ": " + list;
}
}).withBroadcastSet(broadcast, "number") // 2. Broadcast the dataset
.printToErr(); // print to stderr for inspection
}
}Output:
a: [1, 2, 3]
b: [1, 2, 3]Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
