Unlocking Java Parallel Streams: When to Use, Performance Tips, and Pitfalls
This article explains Java parallel streams, shows how to replace manual thread handling with a one‑line parallelStream call, benchmarks its performance, discusses cases where it slows down, highlights shared‑variable hazards, and provides practical guidelines for effective use.
Java 8 introduced parallel streams, allowing developers to process collections concurrently with a single method call instead of manually splitting data, creating threads, and merging results.
What Is a Parallel Stream?
A parallel stream divides a stream into multiple data blocks and processes each block in a separate thread. For example, given a list of Apple objects where each apple has a weight, the price can be calculated sequentially:
List<Apple> appleList = new ArrayList<>();
for (Apple apple : appleList) {
apple.setPrice(5.0 * apple.getWeight() / 1000);
}Using a parallel stream the same logic becomes a one‑liner:
appleList.parallelStream().forEach(apple ->
apple.setPrice(5.0 * apple.getWeight() / 1000));The underlying thread pool is the default ForkJoinPool, whose size equals the number of processor cores unless overridden via the system property java.util.concurrent.ForkJoinPool.common.parallelism.
Performance Test
To illustrate the speedup, each price calculation is followed by a 1‑second sleep to simulate I/O. The sequential version logs the total time, then the parallel version does the same:
public static void main(String[] args) throws InterruptedException {
List<Apple> appleList = initAppleList();
Date begin = new Date();
for (Apple apple : appleList) {
apple.setPrice(5.0 * apple.getWeight() / 1000);
Thread.sleep(1000);
}
Date end = new Date();
log.info("Apples: {}, Time: {}s", appleList.size(),
(end.getTime() - begin.getTime()) / 1000);
} List<Apple> appleList = initAppleList();
Date begin = new Date();
appleList.parallelStream().forEach(apple -> {
apple.setPrice(5.0 * apple.getWeight() / 1000);
try { Thread.sleep(1000); } catch (InterruptedException e) { e.printStackTrace(); }
});
Date end = new Date();
log.info("Apples: {}, Time: {}s", appleList.size(),
(end.getTime() - begin.getTime()) / 1000);On a quad‑core i5 machine the parallel version finishes in roughly 1 second, matching the expectation that each core handles one thread.
When Parallel Streams May Not Be Faster
Not all stream sources split well. Stream.iterate generates boxed objects and cannot be partitioned efficiently, making its parallel version slower than a simple loop. In contrast, LongStream.rangeClosed produces primitive values that can be divided into independent chunks, yielding a speedup.
public static long iterativeSum(long n) {
long result = 0;
for (long i = 0; i <= n; i++) result += i;
return result;
}
public static long sequentialSum(long n) {
return Stream.iterate(1L, i -> i + 1).limit(n)
.reduce(Long::sum).get();
}
public static long parallelSum(long n) {
return Stream.iterate(1L, i -> i + 1).limit(n)
.parallel().reduce(Long::sum).get();
}
public static long rangedSum(long n) {
return LongStream.rangeClosed(1, n).reduce(Long::sum).getAsLong();
}
public static long parallelRangedSum(long n) {
return LongStream.rangeClosed(1, n).parallel()
.reduce(Long::sum).getAsLong();
}Shared‑Variable Pitfalls
Parallel streams simplify multithreading but do not eliminate data‑race problems. The following example shows a mutable accumulator used from a parallel stream, producing nondeterministic results:
public static long sideEffectSum(long n) {
Accumulator accumulator = new Accumulator();
LongStream.rangeClosed(1, n).forEach(accumulator::add);
return accumulator.total;
}
public static long sideEffectParallelSum(long n) {
Accumulator accumulator = new Accumulator();
LongStream.rangeClosed(1, n).parallel().forEach(accumulator::add);
return accumulator.total;
}
static class Accumulator {
private long total = 0;
public void add(long value) { total += value; }
}Sequential execution consistently yields 50000005000000, while the parallel version varies due to concurrent updates. When mutable shared state is involved, avoid parallel streams or use thread‑safe constructs.
Best Practices for Using Parallel Streams
Prefer primitive streams ( LongStream, IntStream, DoubleStream) to avoid boxing overhead.
Estimate total work as N × Q (number of elements times per‑element cost). Parallelism is beneficial only when Q is sufficiently large.
Use parallel streams for data that can be cleanly split into independent chunks; avoid them for small collections.
The default thread pool size equals the number of CPU cores; it can be changed globally via java.util.concurrent.ForkJoinPool.common.parallelism, but this affects all parallel streams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Architect Essentials
Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
