Master Java Parallelism: From CountDownLatch to Fork/Join and Beyond

This article explores the evolution of parallel computing in Java, explaining why single‑core performance stalls, how multi‑core CPUs and GPUs enable true concurrency, and demonstrating practical implementations using CountDownLatch, CompletableFuture, Fork/Join, parallel streams, and sharding, while highlighting performance considerations and pitfalls.

Programmer DD
Programmer DD
Programmer DD
Master Java Parallelism: From CountDownLatch to Fork/Join and Beyond

1. Introduction

Many of us who love games imagined having a "shadow clone" technique like in Naruto to attend class and play games simultaneously. While such magic doesn’t exist, we can achieve a similar effect in the computer world through parallelism.

2. Parallelism in Computers

Parallelism didn’t appear out of thin air. In 1971 Intel released the 4004, the first general‑purpose microprocessor with 2,300 transistors, and Gordon Moore proposed Moore’s Law: the number of transistors on a chip doubles roughly every 18 months. Clock speeds have risen from 740 kHz to around 4 GHz today, but higher frequencies bring diminishing returns:

Each additional GHz adds about 25 W of power; beyond ~150 W cooling becomes a problem.

Long pipelines reduce efficiency per clock, so higher frequencies can be slower overall.

Moore predicts his law will become ineffective in the next 10‑20 years.

When single‑core frequencies hit a ceiling, multi‑core CPUs emerged, offering higher performance with lower power consumption, making multithreaded programming easier.

GPUs represent another form of parallelism. Using CUDA, a single kernel can launch millions of threads to process image pixels in parallel, far exceeding what a multithreaded CPU can achieve for data‑parallel tasks.

3. Parallelism in Applications

High‑performance services often rely on asynchronous and parallel techniques. Consider an order‑placement scenario where five independent services (customer, discount, tenant, food, other) are called synchronously, taking at least 250 ms (5 × 50 ms). Since these calls have no dependencies, they can be executed in parallel to reduce latency to roughly 50 ms.

3.1 CountDownLatch/Phaser

CountDownLatch (since JDK 1.5) and Phaser (since JDK 1.7) are synchronization utilities. CountDownLatch works like a counter; threads call await() until the counter reaches zero, while other threads call countDown() when they finish their work.

public class CountDownTask {<br>    private static final int CORE_POOL_SIZE = 4;<br>    private static final int MAX_POOL_SIZE = 12;<br>    private static final long KEEP_ALIVE_TIME = 5L;<br>    private static final int QUEUE_SIZE = 1600;<br>    protected static ExecutorService THREAD_POOL = new ThreadPoolExecutor(
        CORE_POOL_SIZE, MAX_POOL_SIZE, KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new LinkedBlockingQueue<>(QUEUE_SIZE));
    public static void main(String[] args) throws InterruptedException {<br>        CountDownLatch countDownLatch = new CountDownLatch(5);
        OrderInfo orderInfo = new OrderInfo();
        THREAD_POOL.execute(() -> { /* Customer task */ countDownLatch.countDown(); });
        THREAD_POOL.execute(() -> { /* Discount task */ countDownLatch.countDown(); });
        THREAD_POOL.execute(() -> { /* Food task */ countDownLatch.countDown(); });
        THREAD_POOL.execute(() -> { /* Tenant task */ countDownLatch.countDown(); });
        THREAD_POOL.execute(() -> { /* Other task */ countDownLatch.countDown(); });
        countDownLatch.await(1, TimeUnit.SECONDS);
        System.out.println("Main thread: " + Thread.currentThread().getName());
    }
}

3.2 CompletableFuture

CountDownLatch couples business logic with synchronization, which can be error‑prone. Java 8 introduced CompletableFuture, a non‑blocking future that allows composing asynchronous tasks without explicit latch handling.

public class CompletableFutureParallel {<br>    private static final int CORE_POOL_SIZE = 4;<br>    private static final int MAX_POOL_SIZE = 12;<br>    private static final long KEEP_ALIVE_TIME = 5L;<br>    private static final int QUEUE_SIZE = 1600;<br>    protected static ExecutorService THREAD_POOL = new ThreadPoolExecutor(
        CORE_POOL_SIZE, MAX_POOL_SIZE, KEEP_ALIVE_TIME, TimeUnit.SECONDS,
        new LinkedBlockingQueue<>(QUEUE_SIZE));
    public static void main(String[] args) throws Exception {<br>        OrderInfo orderInfo = new OrderInfo();
        List<CompletableFuture<?>> futures = new ArrayList<>();
        futures.add(CompletableFuture.runAsync(() -> { /* Customer */ }, THREAD_POOL));
        futures.add(CompletableFuture.runAsync(() -> { /* Discount */ }, THREAD_POOL));
        futures.add(CompletableFuture.runAsync(() -> { /* Food */ }, THREAD_POOL));
        futures.add(CompletableFuture.runAsync(() -> { /* Other */ }, THREAD_POOL));
        CompletableFuture<Void> allDone = CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
        allDone.get(10, TimeUnit.SECONDS);
        System.out.println(orderInfo);
    }
}

3.3 Fork/Join

While CompletableFuture still relies on a thread pool with a blocking queue, the JDK 1.7 Fork/Join framework provides a work‑stealing pool that reduces contention.

Fork/Join diagram
Fork/Join diagram

Each ForkJoinPool thread has its own deque; it processes tasks in LIFO order but can steal tasks from other deques in FIFO order, minimizing lock contention.

Work‑steal illustration
Work‑steal illustration

Example implementation for the order scenario:

public class OrderTask extends RecursiveTask<OrderInfo> {<br>    protected OrderInfo compute() {<br>        // create and fork subtasks<br>        CustomerTask ct = new CustomerTask();<br>        TenantTask tt = new TenantTask();<br>        DiscountTask dt = new DiscountTask();<br>        FoodTask ft = new FoodTask();<br>        OtherTask ot = new OtherTask();<br>        invokeAll(ct, tt, dt, ft, ot);
        return new OrderInfo(ct.join(), tt.join(), dt.join(), ft.join(), ot.join());
    }
    public static void main(String[] args) {<br>        ForkJoinPool pool = new ForkJoinPool(Runtime.getRuntime().availableProcessors() - 1);
        System.out.println(pool.invoke(new OrderTask()));
    }
}

3.4 parallelStream

Java 8 also provides parallel streams, which internally use the Fork/Join pool. Example: summing numbers 1‑100 in parallel.

public class ParallelStream {<br>    public static void main(String[] args) {<br>        List<Integer> list = new ArrayList<>();
        for (int i = 1; i <= 100; i++) list.add(i);
        LongAdder sum = new LongAdder();
        list.parallelStream().forEach(i -> sum.add(i));
        System.out.println(sum);
    }
}

3.5 Sharding

When processing massive data sets (e.g., millions of user IDs), distributing the workload across multiple machines using a sharding strategy (e.g., id % 50) can dramatically reduce overall execution time.

4. Parallelism Considerations

Thread safety: Use thread‑safe structures like LongAdder instead of plain Integer or Long in concurrent code.

Parameter tuning: Thread‑pool size, queue capacity, parallelism level, and timeout values must be tuned to the specific workload to avoid bottlenecks.

5. Conclusion

The article introduced what parallelism is, its historical evolution, how to achieve it in Java, and key pitfalls to watch out for. Two discussion questions remain:

How should we handle exceptions that occur in one of the parallel tasks?

If a task is not a strong dependency, how should we treat its failure during parallel execution?

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CompletableFutureParallelStreamJava concurrencyForkJoinPoolParallel ProgrammingCountDownLatch
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.