Big Data 4 min read

Understanding and Using Broadcast Variables in Apache Flink

This article explains the concept, usage, precautions, and a practical example of broadcast variables in Apache Flink, illustrating how to initialize, broadcast, retrieve, and apply shared data across parallel operators with Java code snippets.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding and Using Broadcast Variables in Apache Flink

1. Broadcast Variable Introduction

In Flink, parallel operator instances run in separate slots and cannot directly access each other's data; broadcast variables provide a shared, read‑only dataset that is distributed to all tasks, existing as a single copy per node.

2. Usage

1: Initialize data
DataSet<Integer> num = env.fromElements(1, 2, 3)
2: Broadcast data
.withBroadcastSet(toBroadcast, "num");
3: Retrieve data
Collection<Integer> broadcastSet = getRuntimeContext().getBroadcastVariable("num");

3. Precautions

Broadcast state does not allow inter‑task communication. Only the side that broadcasts can modify the state, and all parallel instances must apply identical modifications.

Event order may differ across instances. The order of elements in the broadcast stream is not guaranteed to be the same for every task, so state updates must not rely on input order.

All operator tasks are snapshotted. During checkpointing, each task checkpoints its broadcast state, increasing checkpoint size with higher parallelism.

Broadcast variables reside in memory. They should be kept reasonably small (hundreds of MB); multi‑gigabyte datasets are unsuitable.

4. Practical Example

public class BroadCastTest {
    public static void main(String[] args) throws Exception {
        ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        // 1. Create a DataSet to broadcast
        DataSet<Integer> broadcast = env.fromElements(1, 2, 3);
        DataSet<String> data = env.fromElements("a", "b");
        data.map(new RichMapFunction<String, String>() {
            private List<Integer> list = new ArrayList<>();
            @Override
            public void open(Configuration parameters) throws Exception {
                // 3. Retrieve broadcast data as a Collection
                Collection<Integer> broadcastSet = getRuntimeContext().getBroadcastVariable("number");
                list.addAll(broadcastSet);
            }
            @Override
            public String map(String value) throws Exception {
                return value + ": " + list;
            }
        }).withBroadcastSet(broadcast, "number") // 2. Broadcast the dataset
          .printToErr(); // print to stderr for inspection
    }
}

Output:

a: [1, 2, 3]
b: [1, 2, 3]
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaBig DataFlinkdistributed computingBroadcast Variable
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.