Backend Development 7 min read

Why ConcurrentHashMap Slows Down with Hundreds of Threads? A Deep Performance Test

This article presents a detailed benchmark of java.util.concurrent.ConcurrentHashMap under high thread counts, explains the test methodology using a Groovy‑based FunTester framework, shares raw performance numbers, and uncovers that CPU limits and random number generation are the primary bottlenecks.

FunTester

Jun 14, 2022

Why ConcurrentHashMap Slows Down with Hundreds of Threads? A Deep Performance Test

Test Plan

The benchmark evaluates the read‑performance of java.util.concurrent.ConcurrentHashMap under high concurrency. A fixed number of threads repeatedly perform get() operations on a pre‑populated map. The experiment varies three dimensions:

Thread count (200 +, up to 1000)

Number of operations per thread (expressed in thousands)

Size of the key set (50 – 1000 distinct keys)

All keys and values are integers. The goal is to isolate the cost of map look‑ups from other factors.

Test Implementation

package com.funtest.groovytest;

import com.funtester.base.constaint.FixedThread;
import com.funtester.base.constaint.ThreadBase;
import com.funtester.frame.SourceCode;
import com.funtester.frame.execute.Concurrent;
import java.util.concurrent.ConcurrentHashMap;

public class ConcurrentHashMapTest extends SourceCode {
    static ConcurrentHashMap<Integer, Integer> maps = new ConcurrentHashMap<>();
    static int times   = 1_0000;   // operations per thread
    static int threads = 200;      // default thread pool size
    static int num     = 100;      // number of distinct keys
    static String desc = "ConcurrentHashMap performance test";

    public static void main(String[] args) {
        // populate map with <code>num</code> entries
        for (int i = 1; i <= num; i++) {
            maps.put(i, i);
        }
        ThreadBase.COUNT = false; // disable extra accounting
        RUNUP_TIME = 0;           // no warm‑up period
        new Concurrent(new FunTester(), threads, desc).start();
    }

    private static class FunTester extends FixedThread {
        FunTester() { super(null, times, true); }
        @Override protected void doing() throws Exception {
            // random read from the map
            maps.get(getRandomInt(num));
        }
        @Override FunTester clone() { return new FunTester(); }
    }
}

Benchmark Parameters

The test harness ( FunTester from the FunTester framework) creates a fixed‑size thread pool. Each thread executes times iterations, calling maps.get(getRandomInt(num)) on every iteration. The random index is produced by SourceCode#getRandomInt, which internally uses ThreadLocalRandom.current().nextInt(num) + 1.

Results

Threads  Ops(k)  Keys  QPS (single‑thread)
200      10      100   3038
200      20      100   3539
200      40      100   4066
200      80      100   4334
200      10      200   2823
200      20      200   3587
200      40      200   4736
200      10      400   2919
200      10       50   2873
200      10     1000   3256
300      10      100   1893
300      20      100   2514
300      40      100   3214
300      20      300   1798
300      20      500   2832
500      20      100   1722
500      20     1000   1509
1000     20     1000    816
1000     10      100    724

Key observations:

Increasing the thread count raises total QPS only modestly; per‑thread throughput drops sharply once the CPU becomes saturated.

Larger key sets also reduce QPS, indicating more cache contention.

The benchmark quickly reaches the CPU limit, so CPU utilization was not recorded separately.

Analysis

The dominant bottleneck is CPU saturation, not the internal synchronization of ConcurrentHashMap. Fewer threads achieve higher per‑thread throughput, while adding threads yields diminishing returns for total throughput.

An additional hidden cost was identified in the random‑number generation. The original SourceCode#getRandomInt method calls ThreadLocalRandom.current().nextInt() on every iteration, which adds noticeable overhead. Replacing it with a faster generator (or pre‑computing random indices) increased measured QPS by an order of magnitude, confirming that random‑number generation, not the map implementation, was the primary slowdown.

Random‑Number Helper

/**
 * Return a random integer in the inclusive range [1, num].
 * @param num upper bound (inclusive)
 * @return random integer
 */
public static int getRandomInt(int num) {
    return ThreadLocalRandom.current().nextInt(num) + 1;
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java performance multithreading Benchmark ConcurrentHashMap ThreadLocalRandom

Written by

FunTester

10k followers, 1k articles | completely useless

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.