Backend Development 28 min read

Understanding ForkJoinPool and the Fork/Join Framework in Java

This article explains the Fork/Join model, its divide‑and‑conquer basis, how ForkJoinPool is implemented in Java, demonstrates custom RecursiveTask code for summing ranges, discusses task submission, work‑stealing, common pool pitfalls, performance testing, and best‑practice recommendations.

Top Architect

Nov 18, 2024

Understanding ForkJoinPool and the Fork/Join Framework in Java

Previously we learned about ThreadPoolExecutor, which manages a task queue and a pool of threads to handle concurrent work. However, ThreadPoolExecutor has two major drawbacks: it cannot split large tasks for parallel execution, and worker threads compete for tasks from the queue, both of which can hurt performance in high‑concurrency scenarios.

To address these issues, the Fork/Join framework provides an alternative. It is based on the divide‑and‑conquer algorithm, recursively breaking a large problem into smaller independent sub‑problems, solving them in parallel, and then merging the results.

1. Divide‑and‑Conquer and Fork/Join Model

The core idea of divide‑and‑conquer is to decompose a problem of size N into K smaller sub‑problems of the same nature, solve each independently, and combine their solutions to obtain the final answer. The steps are:

Divide : split the problem into smaller sub‑problems.

Solve : compute each sub‑problem directly when it becomes small enough.

Combine : merge the sub‑problem results to form the final solution.

In concurrent computing, the Fork/Join model repeatedly applies these steps, creating a tree of tasks that can be processed by multiple threads.

2. Fork/Join Application Example

We implement a RecursiveTask<Long> called TheKingRecursiveSumTask that computes the sum of integers in a given range. The task splits itself when the range size exceeds a threshold, creating two subtasks, forking them, and joining their results.

public class TheKingRecursiveSumTask extends RecursiveTask<Long> {
    private static final AtomicInteger taskCount = new AtomicInteger();
    private final int sumBegin;
    private final int sumEnd;
    private final int threshold;

    public TheKingRecursiveSumTask(int sumBegin, int sumEnd, int threshold) {
        this.sumBegin = sumBegin;
        this.sumEnd = sumEnd;
        this.threshold = threshold;
    }

    @Override
    protected Long compute() {
        if ((sumEnd - sumBegin) > threshold) {
            TheKingRecursiveSumTask subTask1 = new TheKingRecursiveSumTask(sumBegin, (sumBegin + sumEnd) / 2, threshold);
            TheKingRecursiveSumTask subTask2 = new TheKingRecursiveSumTask((sumBegin + sumEnd) / 2, sumEnd, threshold);
            subTask1.fork();
            subTask2.fork();
            taskCount.incrementAndGet();
            return subTask1.join() + subTask2.join();
        }
        long result = 0L;
        for (int i = sumBegin; i < sumEnd; i++) {
            result += i;
        }
        return result;
    }

    public static AtomicInteger getTaskCount() {
        return taskCount;
    }
}

The main method creates a pool with parallelism 16, runs the recursive sum from 0 to 10,000,000 with a threshold of 100, and compares the result and execution time with a single‑threaded loop.

public static void main(String[] args) {
    int sumBegin = 0, sumEnd = 10000000;
    computeByForkJoin(sumBegin, sumEnd);
    computeBySingleThread(sumBegin, sumEnd);
}

private static void computeByForkJoin(int sumBegin, int sumEnd) {
    ForkJoinPool forkJoinPool = new ForkJoinPool(16);
    long start = System.nanoTime();
    TheKingRecursiveSumTask task = new TheKingRecursiveSumTask(sumBegin, sumEnd, 100);
    long result = forkJoinPool.invoke(task);
    System.out.println("ForkJoin task splits: " + TheKingRecursiveSumTask.getTaskCount());
    System.out.println("ForkJoin result: " + result);
    System.out.println("ForkJoin time (ms): " + (System.nanoTime() - start) / 1_000_000);
}

private static void computeBySingleThread(int sumBegin, int sumEnd) {
    long result = 0L;
    long start = System.nanoTime();
    for (int i = sumBegin; i < sumEnd; i++) {
        result += i;
    }
    System.out.println("Single‑thread result: " + result);
    System.out.println("Single‑thread time (ms): " + (System.nanoTime() - start) / 1_000_000);
}

Results show that with a very low threshold (100) the Fork/Join pool performs many splits (131 071) and is slower than the single‑threaded version. Increasing the threshold reduces splits dramatically and yields a clear speed‑up.

3. ForkJoinPool Design and Source Analysis

ForkJoinPool, introduced in Java 7 and widely used in Java 8, implements the Executor and ExecutorService interfaces. It supports two main task types: RecursiveAction (no result) and RecursiveTask (returns a result), both extending ForkJoinTask.

Key constructor variants:

Default constructor : uses the number of available processors as parallelism.

Parallelism‑only constructor : lets you specify the parallelism level.

Full‑parameter constructor : lets you set parallelism, thread factory, exception handler, and async mode.

Task Submission Methods

From non‑fork/join thread

From fork/join thread

Asynchronous execution

execute(ForkJoinTask)

ForkJoinTask.fork()

Invoke and get result

invoke(ForkJoinTask)

ForkJoinTask.invoke()

Submit and obtain Future

submit(ForkJoinTask)

ForkJoinTask.fork() (tasks are Futures)

Core methods such as invoke, execute, and submit handle null checks, push tasks to the internal work queue, and return results or futures as appropriate.

Fork/Join Task Lifecycle

The fork() method pushes a task onto the current worker’s queue (or the common pool if called from a non‑worker thread). The join() method blocks until the task completes, returning the computed value.

public final ForkJoinTask<V> fork() {
    Thread t;
    if ((t = Thread.currentThread()) instanceof ForkJoinWorkerThread)
        ((ForkJoinWorkerThread) t).workQueue.push(this);
    else
        ForkJoinPool.common.externalPush(this);
    return this;
}

public final V join() {
    int s;
    if ((s = doJoin() & DONE_MASK) != NORMAL)
        reportException(s);
    return getRawResult();
}

RecursiveAction

is used for tasks that do not return a value (e.g., parallel sorting), while RecursiveTask is for tasks that produce a result (e.g., sum, Fibonacci).

Work‑Stealing Queues

Each worker thread owns a double‑ended queue. Workers pop tasks from the head (most recent) to benefit from cache locality, while idle workers steal tasks from the tail of other workers’ queues, reducing contention.

4. Caution About the Common Pool

The static ForkJoinPool.commonPool() is shared across the JVM and is used by CompletableFuture and parallel streams. Submitting blocking or long‑running tasks to the common pool can starve computational tasks and degrade overall application performance.

5. Performance Evaluation

Experiments on macOS with JDK 8 show that with a threshold of 100 the Fork/Join version performed 131 071 splits and took 207 ms, while the single‑threaded version took only 40 ms. Raising the threshold to 100 000 reduced splits to 16 383 and made Fork/Join faster (143 ms vs. 410 ms for single thread).

Key factors influencing performance are total task count, per‑task execution time, and the chosen parallelism level. Proper benchmarking is essential before deploying Fork/Join in production.

Conclusion

Fork/Join is a powerful model for pure computational workloads that can be expressed as divide‑and‑conquer problems. Its benefits come from task splitting and work‑stealing, but it requires careful task granularity selection and avoidance of blocking operations. The common pool should be used cautiously, and custom pools are recommended for mixed workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

ForkJoinPool JavaConcurrency DivideAndConquer ParallelComputing RecursiveTask

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.