Backend Development 36 min read

How to Supercharge Java Backend Performance: Parallelism, Thread Pools, Caching, and More

This article walks through practical Java backend performance techniques—including parallel processing with CompletableFuture, fine‑tuned thread‑pool configuration, transaction scope minimization, cache‑line awareness, object‑pool usage, lock granularity, copy‑on‑write collections, and network payload reduction—backed by concrete code samples, benchmark results, and step‑by‑step analysis of trade‑offs and best practices.

Architect

Jul 3, 2024

How to Supercharge Java Backend Performance: Parallelism, Thread Pools, Caching, and More

1. Parallel Processing

When a price‑query flow needs to fetch multiple independent price components (base price, discount price, merchant activity price, platform activity price, etc.), the author demonstrates using CompletableFuture to run each fetch in its own thread. A simple synchronous method calls a(10); b(10); c(10); d(10); while the asynchronous version creates four CompletableFuture.runAsync tasks, adds them to a list, converts the list to an array, and waits for CompletableFuture.allOf(...).join(). The execution times are printed before and after the calls.

The test results (shown in an image) reveal two conclusions: the fewer the methods and the shorter their execution time, the more beneficial synchronous execution becomes. For mixed workloads, the author suggests a “half‑async, half‑sync” approach—run long‑running tasks asynchronously and short ones synchronously—to reduce P99 latency while saving thread resources.

2. Minimizing Transaction Scope

Transactional locks can severely impact performance under high concurrency. The author explains that the @Transactional annotation only provides method‑level granularity, so a programmatic transaction is needed for finer control. An interface TransactionControlService defines execute(ObjectLogicFunction<T>) and execute(VoidLogicFunction). The implementation uses PlatformTransactionManager to begin a transaction, invoke the business logic, commit on success, or roll back on exception. This pattern allows developers to wrap only the critical DB operations, reducing lock contention.

3. Caching

Caching is described as a universal performance booster across e‑commerce, finance, gaming, and live‑streaming. The author lists practical considerations: expiration time, consistency, capacity limits, load‑balancing, concurrent reads/writes, cache‑penetration, cache‑breakdown, and query time complexity. For each issue, mitigation strategies are provided, such as using appropriate TTLs, employing read‑through/write‑through patterns, pre‑loading hot data, and applying multi‑level caches (e.g., local cache + Redis). The section also covers data compression (e.g., Integer[] vs int[]), pre‑loading, and handling cache‑line effects.

4. Thread‑Pool Usage

The article revisits CompletableFuture and notes that it already relies on a thread pool. It then details how to create a custom ThreadPoolExecutor instead of using Executors. An example configuration sets core pool size 2, max pool size 4, keep‑alive 1 minute, a LinkedBlockingQueue of 100, a named ThreadFactory, and a CallerRunsPolicy. The author explains how to choose corePoolSize based on CPU‑bound vs I/O‑bound tasks, using the formula core = cpu / (1 - blockingFactor). It also discusses maximumPoolSize, keepAliveTime, and appropriate WorkQueue sizes, illustrating with a scenario where a service can handle 10 tasks per second but spikes to 20 tasks per second for 10 seconds, requiring a queue length of 100.

Monitoring is essential: the author provides a ThreadPoolMonitor class that logs pool size, active thread count, queue size, and completed task count, and shows how to expose these metrics via Micrometer to Prometheus.

5. Service Warm‑up

Pre‑warming reduces latency for first‑time requests. The author mentions using ThreadPoolExecutor.prestartAllCoreThreads(), relying on Tomcat’s built‑in thread‑pool warm‑up, pre‑loading database connections, and initializing static blocks or singleton holders at startup. It also suggests pre‑loading hot data into caches and using OS page cache or MySQL’s innodb_buffer_pool for better performance.

6. Cache‑Line Alignment

CPU caches are organized in 64‑byte lines. A benchmark shows that iterating a 2‑D array row‑wise (sequential memory access) is dramatically faster than column‑wise access because the latter defeats cache‑line locality. The author provides two Java programs—one writing row‑first, the other column‑first—and reports the timing difference.

To avoid false sharing, a padded class CacheLinePadding is introduced: a Padding superclass holds seven long fields, and the subclass adds a volatile long x, ensuring each instance occupies its own cache line. A test with two threads updating separate padded objects demonstrates superior throughput compared to unpadded objects.

7. Reducing Object Creation

Using primitive types instead of wrapper classes eliminates temporary objects. A benchmark comparing int vs Integer in a tight loop shows a hundred‑fold speed difference and lower CPU/memory usage for primitives. The author also recommends immutable objects (e.g., String literals) and static factory methods (e.g., Boolean.valueOf) to avoid repeated allocations. Object pools (e.g., Apache Commons GenericObjectPool) are presented with a CachePoolUtil enum that configures max total, min idle, max idle, and max wait, and demonstrates borrowing/returning objects in a multi‑threaded test.

8. Concurrency Control

The article surveys lock granularity: volatile for visibility, CAS for lock‑free updates, synchronized object/class locks, spin locks, segment locks (as used by ConcurrentHashMap), read‑write locks ( ReentrantReadWriteLock), and copy‑on‑write collections ( CopyOnWriteArrayList, CopyOnWriteArraySet). It provides code snippets for each mechanism and discusses trade‑offs—e.g., copy‑on‑write is ideal for read‑heavy scenarios but incurs high write cost and memory overhead.

A performance test compares CopyOnWriteArraySet vs ConcurrentHashMap.newKeySet() under read‑heavy and write‑heavy workloads, showing that the former excels when reads dominate, while the latter is faster when writes dominate.

9. Asynchronous Design

Asynchrony improves system responsiveness. The author lists common async patterns—threads, message queues, event notifications, reactive programming—and emphasizes the “quick ack, later result” principle (e.g., returning a “processing” status and providing a callback or query endpoint).

10. Loop Optimizations

Reducing loop iterations via efficient algorithms (binary search, quicksort, hash indexing) and batch fetching is illustrated. The author shows a before‑and‑after example where individual DB queries inside a loop are replaced by a single batch query that returns a map, dramatically cutting round‑trips.

11. Reducing Network Payload

Payload size can be trimmed by selecting only necessary fields in SQL, using compact serialization formats (JSON vs XML vs protobuf), and applying compression (GZIP, zlib). Sample code demonstrates compressing a 2,890‑byte string to 1,474 bytes with GZIP and to 1,518 bytes with zlib using Hutool utilities.

12. Reducing Service Dependencies

Excessive inter‑service calls hurt reliability and performance. The author advises proper microservice boundaries, avoiding duplicate calls, circular dependencies, and mixed‑layer calls. Data redundancy (local copies of dictionaries, wide tables) and result caching are suggested to lower dependency. Message queues are highlighted as a natural decoupling mechanism.

Monitor workQueue size to make informed scaling decisions.

Int Memory Profile

CopyOnWriteArraySet vs ConcurrentHashMap

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Performance Optimization concurrency ThreadPool Caching

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.