Big Data 24 min read

Understanding Hadoop's Circular Buffer in the Shuffle Phase

This article explains how Hadoop's MapReduce shuffle uses a circular buffer data structure to store serialized key/value pairs and their metadata in memory, describes its initialization, write path, spill handling, and the underlying algorithms that ensure efficient in‑memory sorting and disk spilling.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding Hadoop's Circular Buffer in the Shuffle Phase

During a Hadoop MapReduce job, the shuffle phase stores map output in an in‑memory circular buffer before it is spilled to disk. The buffer is a contiguous byte array that holds serialized key/value data and a parallel int array that stores metadata for each record (value start, key start, partition, and value length).

Circular Buffer Data Structure

The buffer is defined in MapTask.MapOutputBuffer with fields such as kvbuffer (byte array) and kvmeta (IntBuffer). Each record’s metadata occupies four ints (VALSTART, KEYSTART, PARTITION, VALLEN), giving a fixed METASIZE of 16 bytes.

Initialization

The buffer is created in MapOutputBuffer.init. Its size is derived from the configuration property mapreduce.task.io.sort.mb (default 100 MiB). The method allocates kvbuffer, wraps it as an IntBuffer for kvmeta, and sets the initial equator (the boundary between data and metadata) to zero with setEquator(0). Initial indices ( bufstart, bufend, kvstart, kvend) are all set to this equator.

Writing to the Buffer

When a mapper calls NewOutputCollector.write, the data eventually reaches MapOutputBuffer.collect. The method first reserves space for metadata by decrementing bufferRemaining by METASIZE. If the remaining space is insufficient, a spill is triggered. Otherwise the key and value are serialized into kvbuffer, their start positions are recorded, and the metadata is written into kvmeta. The key may be split across the circular boundary; in that case bb.shiftBufferedKey() re‑orders the bytes to keep the key contiguous for sorting.

public synchronized void collect(K key, V value, final int partition) throws IOException {
    bufferRemaining -= METASIZE;
    if (bufferRemaining <= 0) {
        // trigger spill logic …
    }
    // serialize key and value, update kvmeta, advance indices
}

Spill Handling

If bufferRemaining falls below zero, the buffer prepares for a spill. The method computes the amount of data already written ( bUsed) and checks the soft limit ( softLimit = kvbuffer.length * spillper). When the limit is reached, startSpill() records the current kvend and bufend, marks spillInProgress true, and signals the spill thread.

private void startSpill() {
    kvend = (kvindex + NMETA) % kvmeta.capacity();
    bufend = bufmark;
    spillInProgress = true;
    spillReady.signal();
}

The spill thread’s run() method waits for spillInProgress, then calls sortAndSpill(). Sorting is performed only on the metadata array using the configured IndexedSorter, ordering records first by partition and then by key. After sorting, each partition’s records are written to a temporary spill file (an IFile), optionally passing through a combiner.

private void sortAndSpill() throws IOException, ClassNotFoundException, InterruptedException {
    sorter.sort(this, mstart, mend, reporter);
    // write sorted records to spill file …
}

Equator Adjustment and Buffer Reuse

After a spill begins, the buffer’s equator is moved to free space for new records. The new position is calculated based on the average record size and the distance between the current write pointer and the metadata pointer, ensuring at least half the buffer remains for serialization. The calculation chooses the minimum of several limits and subtracts 2 * METASIZE to keep room for upcoming metadata.

When a spill finishes, resetSpill() restores the buffer pointers to the equator, effectively discarding the spilled region and allowing the buffer to be reused without reallocating memory.

private void resetSpill() {
    int e = equator;
    bufstart = bufend = e;
    int aligned = e - (e % METASIZE);
    kvstart = kvend = (int)(((long)aligned - METASIZE + kvbuffer.length) % kvbuffer.length) / 4;
}

The process repeats: new map output is collected, the buffer may trigger additional spills, and the circular buffer continuously cycles between data and metadata regions, providing an efficient in‑memory staging area for Hadoop’s shuffle.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MapReduceHadoopShufflecircular bufferIn-Memory Buffer
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.