Understanding ForkJoinPool: Divide‑and‑Conquer, Implementation, and Performance Evaluation
This article explains the Fork/Join model and Java's ForkJoinPool, covering the divide‑and‑conquer algorithm, custom RecursiveTask examples, core source‑code analysis, common pool pitfalls, and performance testing to help developers decide when and how to use ForkJoinPool effectively.
1. Divide‑and‑Conquer Algorithm and Fork/Join Model
The Fork/Join model applies the classic divide‑and‑conquer strategy: a large problem of size N is recursively split into K smaller independent sub‑problems, each solved separately, and the partial results are merged to obtain the final answer.
The basic steps of divide‑and‑conquer are:
Divide : partition the problem into smaller sub‑problems.
Solve : solve each sub‑problem directly when it becomes sufficiently small.
Merge : combine the sub‑problem solutions to form the solution of the original problem.
2. Fork/Join Application Scenario and Experience
Scenario: compute the sum of all integers from 1 to n. A custom TheKingRecursiveSumTask class extending RecursiveTask is used to demonstrate task splitting.
public class TheKingRecursiveSumTask extends RecursiveTask {
private static final AtomicInteger taskCount = new AtomicInteger();
private final int sumBegin;
private final int sumEnd;
/**
* Task splitting threshold; tasks larger than this will be split.
*/
private final int threshold;
public TheKingRecursiveSumTask(int sumBegin, int sumEnd, int threshold) {
this.sumBegin = sumBegin;
this.sumEnd = sumEnd;
this.threshold = threshold;
}
@Override
protected Long compute() {
if ((sumEnd - sumBegin) > threshold) {
// Split the task
TheKingRecursiveSumTask subTask1 = new TheKingRecursiveSumTask(sumBegin, (sumBegin + sumEnd) / 2, threshold);
TheKingRecursiveSumTask subTask2 = new TheKingRecursiveSumTask((sumBegin + sumEnd) / 2, sumEnd, threshold);
subTask1.fork();
subTask2.fork();
taskCount.incrementAndGet();
return subTask1.join() + subTask2.join();
}
// Direct computation
long result = 0L;
for (int i = sumBegin; i < sumEnd; i++) {
result += i;
}
return result;
}
public static AtomicInteger getTaskCount() {
return taskCount;
}
}The driver code creates a pool with parallelism 16, sets the threshold to 100, and compares ForkJoin execution with a single‑threaded loop.
public static void main(String[] args) {
int sumBegin = 0, sumEnd = 10000000;
computeByForkJoin(sumBegin, sumEnd);
computeBySingleThread(sumBegin, sumEnd);
}
private static void computeByForkJoin(int sumBegin, int sumEnd) {
ForkJoinPool forkJoinPool = new ForkJoinPool(16);
long forkJoinStartTime = System.nanoTime();
TheKingRecursiveSumTask task = new TheKingRecursiveSumTask(sumBegin, sumEnd, 100);
long forkJoinResult = forkJoinPool.invoke(task);
System.out.println("======");
System.out.println("ForkJoin task splits: " + TheKingRecursiveSumTask.getTaskCount());
System.out.println("ForkJoin result: " + forkJoinResult);
System.out.println("ForkJoin time (ms): " + (System.nanoTime() - forkJoinStartTime) / 1000000);
}
private static void computeBySingleThread(int sumBegin, int sumEnd) {
long computeResult = 0L;
long startTime = System.nanoTime();
for (int i = sumBegin; i < sumEnd; i++) {
computeResult += i;
}
System.out.println("======");
System.out.println("Single‑thread result: " + computeResult);
System.out.println("Single‑thread time (ms): " + (System.nanoTime() - startTime) / 1000000);
} =====
ForkJoin task splits: 131071
ForkJoin result: 49999995000000
ForkJoin time (ms): 207
=====
Single‑thread result: 49999995000000
Single‑thread time (ms): 40The result shows that with a very low split threshold the ForkJoinPool incurs far more overhead than a simple loop, making it slower.
3. ForkJoinPool Design and Source‑Code Analysis
ForkJoinPool, introduced in Java 7 and widely used since Java 8, implements the Executor and ExecutorService interfaces and extends AbstractExecutorService. It supports two main task types: RecursiveAction (no result) and RecursiveTask (returns a result), both subclasses of ForkJoinTask .
Key constructor parameters:
parallelism : number of worker threads (defaults to the number of available processors).
ForkJoinWorkerThreadFactory : factory for creating worker threads.
UncaughtExceptionHandler : handler for uncaught exceptions.
asyncMode : determines whether the work queue uses FIFO (true) or LIFO (false).
Three ways to construct a pool:
// 1. Default constructor
public ForkJoinPool() {
this(Math.min(MAX_CAP, Runtime.getRuntime().availableProcessors()),
defaultForkJoinWorkerThreadFactory, null, false);
}
// 2. Specify parallelism only
public ForkJoinPool(int parallelism) {
this(parallelism, defaultForkJoinWorkerThreadFactory, null, false);
}
// 3. Full parameter constructor
public ForkJoinPool(int parallelism, ForkJoinWorkerThreadFactory factory,
UncaughtExceptionHandler handler, boolean asyncMode) {
this(checkParallelism(parallelism), checkFactory(factory), handler,
asyncMode ? FIFO_QUEUE : LIFO_QUEUE,
"ForkJoinPool-" + nextPoolId() + "-worker-");
checkPermission();
}Task submission methods include execute (asynchronous, no result), invoke (synchronous, returns result), and submit (returns a ForkJoinTask future). The table below shows which API is used from a non‑fork/join thread versus a fork/join thread.
From non‑fork/join thread
From fork/join thread
Asynchronous execution
execute(ForkJoinTask)
ForkJoinTask.fork()
Wait and get result
invoke(ForkJoinTask)
ForkJoinTask.invoke()
Submit for Future
submit(ForkJoinTask)
ForkJoinTask.fork() (tasks are Futures)
Core methods of ForkJoinTask :
public final ForkJoinTask fork() {
Thread t;
if ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread)
((ForkJoinWorkerThread) t).workQueue.push(this);
else
ForkJoinPool.common.externalPush(this);
return this;
}
public final V join() {
int s;
if ((s = doJoin() & DONE_MASK) != NORMAL)
reportException(s);
return getRawResult();
}RecursiveAction and RecursiveTask are the two concrete subclasses. Example of a sorting action:
static class SortTask extends RecursiveAction {
final long[] array;
final int lo, hi;
SortTask(long[] array, int lo, int hi) { this.array = array; this.lo = lo; this.hi = hi; }
SortTask(long[] array) { this(array, 0, array.length); }
protected void compute() {
if (hi - lo < THRESHOLD) {
sortSequentially(lo, hi);
} else {
int mid = (lo + hi) >>> 1;
invokeAll(new SortTask(array, lo, mid), new SortTask(array, mid, hi));
merge(lo, mid, hi);
}
}
static final int THRESHOLD = 1000;
void sortSequentially(int lo, int hi) { Arrays.sort(array, lo, hi); }
void merge(int lo, int mid, int hi) {
long[] buf = Arrays.copyOfRange(array, lo, mid);
for (int i = 0, j = lo, k = mid; i < buf.length; j++)
array[j] = (k == hi || buf[i] < array[k]) ? buf[i++] : array[k++];
}
}And a classic Fibonacci RecursiveTask example:
class Fibonacci extends RecursiveTask
{
final int n;
Fibonacci(int n) { this.n = n; }
Integer compute() {
if (n <= 1) return n;
Fibonacci f1 = new Fibonacci(n - 1);
f1.fork();
Fibonacci f2 = new Fibonacci(n - 2);
return f2.compute() + f1.join();
}
}4. Caution About ForkJoinPool#commonPool
The static commonPool is a shared pool used internally by CompletableFuture and parallel streams. While it reduces the cost of creating multiple pools, it can become a bottleneck if other components submit blocking tasks, potentially starving compute‑heavy tasks and degrading overall application performance.
5. ForkJoinPool Performance Evaluation
Experiments on macOS with JDK 8 compared the custom sum task (threshold = 100) against a single‑thread loop. The ForkJoin version performed ~5× slower because the split granularity was too fine, creating 131 071 subtasks.
Increasing the threshold to 100 000 reduced the number of splits to 16 383 and made ForkJoin roughly three times faster than the single‑thread version.
=====
ForkJoin task splits: 16383
ForkJoin result: 499999999500000000
ForkJoin time (ms): 143
=====
Single‑thread result: 499999999500000000
Single‑thread time (ms): 410Key factors influencing performance are total task count, per‑task execution time, and parallelism level. Properly tuning the split threshold and avoiding blocking operations are essential for achieving the expected speed‑up.
Conclusion
ForkJoinPool implements the divide‑and‑conquer model for parallel computation. Its advantages stem from task splitting and work‑stealing, but developers must ensure tasks are pure computations, choose an appropriate split granularity, and be cautious with the shared commonPool to avoid performance regressions.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.