Big Data 9 min read

Comprehensive Guide to Flink Partitioners and Their Implementations

This article explains the eight built‑in Flink partitioners, their distribution strategies, key implementation details, and provides Java code examples illustrating how each partitioner selects downstream channels and determines pointwise or all‑to‑all distribution.

Big Data Technology & Architecture

Aug 15, 2022

Comprehensive Guide to Flink Partitioners and Their Implementations

Flink provides eight built‑in partitioners—RebalancePartitioner, RescalePartitioner, KeyGroupStreamPartitioner, GlobalPartitioner, ShufflePartitioner, ForwardPartitioner, BroadcastPartitioner, and CustomPartitionerWrapper—each extending the abstract StreamPartitioner class.

The partitioners are:

RebalancePartitioner RescalePartitioner KeyGroupStreamPartitioner GlobalPartitioner ShufflePartitioner ForwardPartitioner CustomPartitionerWrapper BroadcastPartitioner

1. RebalancePartitioner

Distributes records evenly by cycling through all downstream channels (all‑to‑all mode). The first channel is chosen randomly, then a round‑robin counter is incremented.

private int nextChannelToSendTo;

@Override
public void setup(int numberOfChannels) {
    super.setup(numberOfChannels);
    nextChannelToSendTo = ThreadLocalRandom.current().nextInt(numberOfChannels);
}

@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
    nextChannelToSendTo = (nextChannelToSendTo + 1) % numberOfChannels;
    return nextChannelToSendTo;
}

@Override
public boolean isPointwise() { return false; }

2. RescalePartitioner

Chooses downstream channels based on the parallelism of upstream and downstream operators, providing pointwise distribution while allowing some local processing to reduce network I/O.

// round‑robin starting from channel 0
@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
    if (++nextChannelToSendTo >= numberOfChannels) {
        nextChannelToSendTo = 0;
    }
    return nextChannelToSendTo;
}

@Override
public boolean isPointwise() { return true; }

3. GlobalPartitioner

Sends all records to the downstream subtask with ID 0 (all‑to‑all mode).

@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) { return 0; }

@Override
public boolean isPointwise() { return false; }

4. ForwardPartitioner

Forwards records only to the locally running downstream subtask (pointwise mode).

@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) { return 0; }

@Override
public boolean isPointwise() { return true; }

5. BroadcastPartitioner

Broadcasts each record to all downstream channels; channel selection is unsupported.

@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
    throw new UnsupportedOperationException("Broadcast partitioner does not support select channels.");
}

6. KeyGroupStreamPartitioner

Assigns records to channels based on a key group derived from the record’s key, using a two‑step hash (hashCode then MurmurHash) and modulo arithmetic.

public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
    K key = keySelector.getKey(record.getInstance().getValue());
    return KeyGroupRangeAssignment.assignKeyToParallelOperator(key, maxParallelism, numberOfChannels);
}

7. ShufflePartitioner

Randomly selects a downstream channel for each record (all‑to‑all mode).

private Random random = new Random();

@Override
public int selectChannel(SerializationDelegate<StreamRecord<T>> record) {
    return random.nextInt(numberOfChannels);
}

@Override
public boolean isPointwise() { return false; }

8. CustomPartitionerWrapper

Allows users to provide a custom Partitioner and KeySelector to define arbitrary partitioning logic.

public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, KeySelector<T, K> keySelector) {
    return setConnectionType(new CustomPartitionerWrapper<>(clean(partitioner), clean(keySelector)));
}

These partitioners are used by Flink’s StreamingJobGraphGenerator to convert a StreamGraph into a JobGraph, where the isPointwise() method determines whether the edge uses DistributionPattern.POINTWISE or DistributionPattern.ALL_TO_ALL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

java Big Data Flink Partitioner

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.