Backend Development 29 min read

Boost Java Service Performance: Code & Design Optimizations Explained

This article explores comprehensive performance optimization techniques for Java services, covering code-level strategies such as preloading classes, cache line alignment, branch prediction, copy‑on‑write, inlining, and design approaches like caching, asynchronous processing, pooling, and pre‑handling, while highlighting trade‑offs and practical examples.

JD Cloud Developers

May 23, 2023

Boost Java Service Performance: Code & Design Optimizations Explained

1. Introduction

Service performance refers to response speed, throughput, and resource utilization under specific conditions. Optimizing performance typically consumes 10%–25% of a software development cycle and impacts user experience, system reliability, resource costs, and market competitiveness.

2. Code Optimization

2.1 Preloading Related Classes

Preloading avoids runtime class loading overhead. In Java, the Bootstrap class loader loads core API classes, while the Application class loader loads custom classes. Preloading can be done with a static block:

public class MainClass {
    static {
        // Preload MyClass which implements related functionality
        Class.forName("com.example.MyClass");
    }
    // Run related functionality
    // ...
}

2.2 Cache Alignment

Understanding cache lines (typically 64 bytes), false sharing, CPU stalls, and IPC helps identify memory‑intensive versus compute‑intensive workloads. Reducing false sharing can be achieved by padding data to separate variables onto different cache lines:

/**
 * Cache line padding test
 */
public class FalseSharingTest {
    private static final int LOOP_NUM = 1_000_000_000;
    public static void main(String[] args) throws InterruptedException {
        Struct struct = new Struct();
        long start = System.currentTimeMillis();
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < LOOP_NUM; i++) {
                struct.x++;
            }
        });
        Thread t2 = new Thread(() -> {
            for (int i = 0; i < LOOP_NUM; i++) {
                struct.y++;
            }
        });
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println("cost time [" + (System.currentTimeMillis() - start) + "] ms");
    }
    static class Struct {
        volatile long x;
        // 7 padding longs to separate x and y onto different cache lines
        long p1, p2, p3, p4, p5, p6, p7;
        volatile long y;
    }
}

Using the @Contended annotation (Java 8) can also force cache‑line alignment when the JVM is started with -XX:-RestrictContended:

import sun.misc.Contended;
public class ContendedTest {
    @Contended
    volatile long a;
    @Contended
    volatile long b;
    public static void main(String[] args) throws InterruptedException {
        ContendedTest c = new ContendedTest();
        Thread t1 = new Thread(() -> {
            for (int i = 0; i < 100_000_000L; i++) {
                c.a = i;
            }
        });
        Thread t2 = new Thread(() -> {
            for (int i = 0; i < 100_000_000L; i++) {
                c.b = i;
            }
        });
        long start = System.nanoTime();
        t1.start();
        t2.start();
        t1.join();
        t2.join();
        System.out.println((System.nanoTime() - start) / 1_000_000);
    }
}

2.3 Branch Prediction

Branch prediction guesses the execution path of conditional statements to reduce CPU stalls. Keeping cyclomatic complexity low and placing the most common path in the if branch improve prediction accuracy.

2.4 Copy‑On‑Write (COW)

COW defers copying data until a write occurs, reducing memory usage and improving performance. Example with CopyOnWriteArrayList:

private List<String> list = new CopyOnWriteArrayList<>();
list.add("value");

2.5 Inline Optimization

JIT inlining replaces method calls with the method body. Using final methods, keeping methods short, and tuning JVM options such as -XX:MaxInlineSize, -XX:FreqInlineSize, and -XX:MaxInlineLevel can increase inlining opportunities. The deprecated @inline annotation was replaced by @ForceInline with experimental VM options:

-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+JVMCICompiler
@ForceInline
public static int add(int a, int b) { return a + b; }

2.6 Reflection Optimization

Reflection incurs type checks and method lookups. Mitigate its cost by using native calls, caching reflective results, or employing bytecode‑generation libraries such as Javassist or Byte Buddy. Example of a reflective utility with caching:

public abstract class BeanUtils {
    private static final Logger LOGGER = LoggerFactory.getLogger(BeanUtils.class);
    private static final Field[] NO_FIELDS = {};
    private static final Map<Class<?>, Field[]> DECLARED_FIELDS_CACHE = new ConcurrentReferenceHashMap<>(256);
    private static final Map<Class<?>, Field[]> FIELDS_CACHE = new ConcurrentReferenceHashMap<>(256);

    public static Field[] getFields(Class<?> clazz) {
        if (clazz == null) throw new IllegalArgumentException("Class must not be null");
        Field[] result = FIELDS_CACHE.get(clazz);
        if (result == null) {
            Field[] fields = NO_FIELDS;
            Class<?> search = clazz;
            while (Object.class != search && search != null) {
                fields = mergeArray(fields, getDeclaredFields(search));
                search = search.getSuperclass();
            }
            result = fields;
            FIELDS_CACHE.put(clazz, result.length == 0 ? NO_FIELDS : result);
        }
        return result;
    }
    // ... other utility methods omitted for brevity
}

2.7 Exception Handling

Frequent exceptions add latency, increase memory usage, and raise CPU load. Use exceptions for truly exceptional conditions and keep try‑catch blocks minimal.

2.8 Temporary Objects

Creating many short‑lived objects triggers garbage collection. Prefer StringBuilder for concatenation, batch collection operations, pre‑compiled Pattern, primitive types, and object pools to reduce temporary allocations.

3. Design Optimization

3.1 Caching

Proper caching reduces data access latency and load on downstream services. Local caches (e.g., Caffeine, Guava, Ehcache) complement distributed caches (e.g., Redis, Memcached). A simple LRU local cache example:

public class LRUHashMap<K, V> extends LinkedHashMap<K, V> {
    private final int maxSize;
    public LRUHashMap(int maxSize) {
        super(maxSize, 0.75f, true);
        this.maxSize = maxSize;
    }
    @Override
    protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
        return size() > maxSize;
    }
}

3.2 Asynchronous Processing

Non‑blocking I/O and coroutine‑style virtual threads improve throughput. Example of asynchronous Spring MVC endpoints:

@GetMapping("/async/callable")
public WebAsyncTask<String> asyncCallable() {
    Callable<String> callable = () -> "Async task completed";
    return new WebAsyncTask<>(10000, callable);
}

@GetMapping("/async/deferredresult")
public DeferredResult<String> asyncDeferredResult() {
    DeferredResult<String> dr = new DeferredResult<>(10000L);
    dr.setResult("DeferredResult task completed");
    return dr;
}

Virtual threads (Java 19 preview) provide lightweight user‑mode threads:

Thread thread = Thread.ofVirtual()
    .name("Virtual Threads")
    .unstarted(runnable);
ThreadFactory factory = Thread.ofVirtual().factory();

3.3 Parallelism

Parallel processing underlies big‑data frameworks (MapReduce), edge computing, and multi‑stage request handling. Decouple components and execute independent stages concurrently using threads, coroutines, message queues, or non‑blocking I/O.

3.4 Pooling

Pooling pre‑allocates resources such as threads or database connections to avoid costly creation at request time. Example: configure a JDBC connection pool to reuse TCP connections, reducing latency from ~200 ms per new connection.

3.5 Pre‑processing

Pre‑load frequently used data into memory, pre‑compute results, compress payloads, or use prepared statements (e.g., MyBatis) to lower runtime overhead.

4. Summary

Performance optimization is an unavoidable aspect of software development. This article highlighted code‑level tactics (class preloading, cache alignment, branch prediction, COW, inlining, reflection avoidance, exception handling, temporary object reduction) and design‑level strategies (caching, async/virtual threads, parallelism, pooling, pre‑processing). While not exhaustive, the presented patterns aim to inspire further exploration and practical improvement.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Design Patterns Java concurrency Caching code optimization

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.