Backend Development 17 min read

Understanding Java Stream Pipeline: Implementation Principles and Execution

This article explains how Java Stream pipelines are built and executed, covering operation recording, the role of stages and sinks, differences between intermediate and terminal operations, and how the library achieves efficient single-pass processing while avoiding common pitfalls.

Architect's Tech Stack

Jun 15, 2020

Understanding Java Stream Pipeline: Implementation Principles and Execution

Before diving into Stream API usage, we explore how the underlying pipeline works, answering questions about iteration, parallelism, and thread management.

We start by reviewing how a container executes a lambda, using

// ArrayList.forEach()
public void forEach(Consumer<? super E> action) {
    ...
    for (int i=0; modCount == expectedModCount && i < size; i++) {
        action.accept(elementData[i]); // callback method
    }
    ...
}

as an example.

Stream operations are divided into intermediate and terminal operations. Intermediate operations are merely markers; only a terminal operation triggers actual computation. Intermediate operations can be stateless or stateful, and terminal operations can be short‑circuit or non‑short‑circuit, influencing how the library handles each case.

A naïve implementation would execute each function call in a separate iteration, generating intermediate results each time. This approach suffers from multiple passes and high memory overhead.

To achieve true pipeline behavior, the library records each user operation as a stage . When a terminal operation is invoked, all recorded stages are combined into a single pass. The Head stage represents the data source, while StatelessOp and StatefulOp represent the two kinds of intermediate operations.

Each stage wraps its operation into a Sink. The Sink interface defines four methods— begin(), accept(), cancellationRequested(), and end() —which together manage processing, forwarding, short‑circuit checks, and finalization.

For example, Stream.map() is implemented as:

// Stream.map(), returns a new Stream
public final <R> Stream<R> map(Function<? super P_OUT, ? extends R> mapper) {
    ...
    return new StatelessOp<P_OUT, R>(this, StreamShape.REFERENCE,
            StreamOpFlag.NOT_SORTED | StreamOpFlag.NOT_DISTINCT) {
        @Override
        Sink<P_OUT> opWrapSink(int flags, Sink<R> downstream) {
            return new Sink.ChainedReference<P_OUT, R>(downstream) {
                @Override
                public void accept(P_OUT u) {
                    R r = mapper.apply(u); // 1. apply mapper
                    downstream.accept(r); // 2. forward result
                }
            };
        }
    };
}

The sorted() operation, a stateful intermediate operation, uses a more complex sink:

class RefSortingSink<T> extends AbstractRefSortingSink<T> {
    private ArrayList<T> list; // stores elements for sorting
    RefSortingSink(Sink<? super T> downstream, Comparator<? super T> comparator) {
        super(downstream, comparator);
    }
    @Override
    public void begin(long size) {
        list = (size >= 0) ? new ArrayList<T>((int)size) : new ArrayList<T>();
    }
    @Override
    public void accept(T t) {
        list.add(t); // collect elements
    }
    @Override
    public void end() {
        list.sort(comparator);
        downstream.begin(list.size());
        if (!cancellationWasRequested) {
            list.forEach(downstream::accept);
        } else {
            for (T t : list) {
                if (downstream.cancellationRequested()) break;
                downstream.accept(t);
            }
        }
        downstream.end();
        list = null;
    }
}

All stages are combined by repeatedly calling opWrapSink from the terminal sink upwards, as shown in AbstractPipeline.wrapSink():

final <P_IN> Sink<P_IN> wrapSink(Sink<E_OUT> sink) {
    for (AbstractPipeline p = this; p.depth > 0; p = p.previousStage) {
        sink = p.opWrapSink(p.previousStage.combinedFlags, sink);
    }
    return (Sink<P_IN>) sink;
}

Execution is performed by AbstractPipeline.copyInto(), which notifies the sink of the start, iterates over the spliterator, and signals the end:

final <P_IN> void copyInto(Sink<P_IN> wrappedSink, Spliterator<P_IN> spliterator) {
    if (!StreamOpFlag.SHORT_CIRCUIT.isKnown(getStreamAndOpFlags())) {
        wrappedSink.begin(spliterator.getExactSizeIfKnown());
        spliterator.forEachRemaining(wrappedSink);
        wrappedSink.end();
    }
}

Terminal operations may return results or have side effects. Using side effects (e.g., collecting into a list inside forEach) is discouraged; instead, collectors should be used:

// Wrong collection
ArrayList<String> results = new ArrayList<>();
stream.filter(s -> pattern.matcher(s).matches())
      .forEach(s -> results.add(s)); // unnecessary side‑effects!
// Correct collection
List<String> results = stream.filter(s -> pattern.matcher(s).matches())
                           .collect(Collectors.toList()); // no side‑effects

The article concludes that understanding the Stream pipeline’s internal design helps write correct and efficient Stream code, alleviating concerns about performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Java concurrency Pipeline stream-api

Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.