Performance Optimization: Code and Design Techniques for Java Services
This article explains service performance concepts and presents systematic code‑level and architectural optimizations for Java back‑ends, covering class preloading, thread‑pool usage, static variables, cache‑line alignment, false sharing mitigation, branch prediction, copy‑on‑write, method inlining, reflection caching, exception handling, logging practices, lock granularity, and pooling strategies, all illustrated with concrete code examples.
Service performance refers to response speed, throughput and resource utilization under specific conditions; optimizing it can improve user experience, reliability, resource costs and market competitiveness.
Performance optimization is a systematic engineering effort that can be divided into network, service and storage directions, each further split into architecture, design, code, usability and metrics. This article focuses on code and design aspects.
Code Optimization
Related‑code preloading avoids runtime class loading overhead. In Java, the Bootstrap and Application class loaders load core API classes and custom classes respectively. Preloading can be achieved via static blocks:
public class MainClass {</code>
<code> static {</code>
<code> // preload MyClass which implements related functionality</code>
<code> Class.forName("com.example.MyClass");</code>
<code> }</code>
<code> // runtime code …</code>
<code>}Using a thread pool to create a pool at startup and execute related code asynchronously reduces thread creation cost.
Static variables can cache objects related to associated code, allowing fast access without repeated loading, but must be used safely in multithreaded contexts.
Cache Alignment
CPU caches (L1, L2, L3) read data in 64‑byte cache lines. When two threads modify variables that share a cache line, false sharing causes stalls. Mitigation techniques include padding, the @Contended annotation (enabled with -XX:-RestrictContended), and aligning data structures:
public class FalseSharingTest {</code>
<code> private static final int LOOP_NUM = 1000000000;</code>
<code> public static void main(String[] args) throws InterruptedException {</code>
<code> Struct struct = new Struct();</code>
<code> long start = System.currentTimeMillis();</code>
<code> Thread t1 = new Thread(() -> {</code>
<code> for (int i = 0; i < LOOP_NUM; i++) { struct.x++; }</code>
<code> });</code>
<code> Thread t2 = new Thread(() -> {</code>
<code> for (int i = 0; i < LOOP_NUM; i++) { struct.y++; }</code>
<code> });</code>
<code> t1.start(); t2.start(); t1.join(); t2.join();</code>
<code> System.out.println("cost time [" + (System.currentTimeMillis() - start) + "] ms");</code>
<code> }</code>
<code> static class Struct {</code>
<code> volatile long x;</code>
<code> long p1, p2, p3, p4, p5, p6, p7; // padding</code>
<code> volatile long y;</code>
<code> }</code>
<code>}Using @Contended:
import sun.misc.Contended;</code>
<code>public class ContendedTest {</code>
<code> @Contended public volatile long a;</code>
<code> @Contended public volatile long b;</code>
<code> public static void main(String[] args) throws InterruptedException {</code>
<code> ContendedTest c = new ContendedTest();</code>
<code> Thread t1 = new Thread(() -> { for (int i = 0; i < 10000000; i++) c.a = i; });</code>
<code> Thread t2 = new Thread(() -> { for (int i = 0; i < 10000000; i++) c.b = i; });</code>
<code> long start = System.nanoTime(); t1.start(); t2.start(); t1.join(); t2.join();</code>
<code> System.out.println((System.nanoTime() - start) / 1_000_000);</code>
<code> }</code>
<code>}Branch Prediction
Complex conditional logic increases branch‑prediction difficulty. Keep hot paths early in if statements and reduce nesting to improve prediction accuracy.
Copy‑On‑Write (COW)
COW creates a copy only when a write occurs, reducing memory usage and improving performance for read‑heavy scenarios. Example with CopyOnWriteArrayList:
private List<String> list = new CopyOnWriteArrayList<>();</code>
<code>list.add("value");Method Inlining
Inlining replaces a method call with the method body, reducing call overhead. Use final methods, keep methods short, and tune JVM parameters such as -XX:MaxInlineSize, -XX:FreqInlineSize, -XX:InlineSmallCode, and -XX:MaxInlineLevel. The now‑deprecated @inline annotation was replaced by @ForceInline with experimental VM options.
-XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+JVMCICompiler</code>
<code>@ForceInline public static int add(int a, int b) { return a + b; }Reflection Optimization
Reflection incurs type‑checking and method‑lookup overhead. Cache reflective results or use bytecode‑generation libraries (e.g., Javassist, Byte Buddy) to avoid reflection at runtime.
public abstract class BeanUtils {</code>
<code> private static final Map<Class<?>, Field[]> DECLARED_FIELDS_CACHE = new ConcurrentReferenceHashMap<>(256);</code>
<code> public static Field[] getFields(Class<?> clazz) { … }</code>
<code> public static Field[] getDeclaredFields(Class<?> clazz) { … }</code>
<code> public static <T> T[] mergeArray(final T[] array1, final T... array2) { … }</code>
<code>}Exception Handling
Frequent exceptions add latency, memory, and CPU overhead. Prefer condition checks for expected error paths and reserve exceptions for truly unexpected situations.
public ServiceResponse<String> badCase4Throw(String p1, String p2) {</code>
<code> try { Assert.notNull(p1); Assert.notNull(p2); /* ... */ }</code>
<code> catch (Throwable e) { return new ServiceResponse(ResponseCodeEnum.PARAM_ERROR); }</code>
<code> return new ServiceResponse<>();</code>
<code>}</code>
<code>public ServiceResponse<String> normCase4Throw(String p1, String p2) {</code>
<code> if (StringUtils.isEmpty(p1) || StringUtils.isEmpty(p2)) { return new ServiceResponse(ResponseCodeEnum.PARAM_ERROR); }</code>
<code> try { /* ... */ } catch (Throwable e) { return new ServiceResponse(ResponseCodeEnum.SYSTEM_ERROR, e.getMessage()); }</code>
<code> return new ServiceResponse<>();</code>
<code>}Logging
Avoid string concatenation in log statements; use parameterized logging to prevent unnecessary object creation when the log level is disabled.
LOGGER.info("result:{} , logid = {}", JsonUtil.write2JsonStr(context), DigitThreadLocal.getLogId());Temporary Objects
Minimize short‑lived objects by using StringBuilder, batch collection operations, pre‑compiled regex patterns, primitive types, and object pools.
Lock Granularity
Choose the smallest appropriate lock: volatile for simple visibility, object lock ( synchronized on instance), class lock ( synchronized on static methods), read‑write lock for read‑heavy scenarios, segment lock for partitioned data structures, spin lock for tiny critical sections, and semaphore for controlling access to multiple resources.
Example of double‑checked locking with volatile:
public class Singleton {</code>
<code> private volatile static Singleton INSTANCE;</code>
<code> private Singleton() { /* ... */ }</code>
<code> public static Singleton getInstance() {</code>
<code> if (INSTANCE == null) {</code>
<code> synchronized (Singleton.class) {</code>
<code> if (INSTANCE == null) { INSTANCE = new Singleton(); }</code>
<code> }</code>
<code> }</code>
<code> return INSTANCE;</code>
<code> }</code>
<code>}Design Optimizations
Effective caching (local L1 and distributed L2) reduces latency and load on data sources. A simple LRU cache example:
public class LRUHashMap<K,V> extends LinkedHashMap<K,V> {</code>
<code> private final int maxSize;</code>
<code> public LRUHashMap(int maxSize) { super(maxSize,0.75f,true); this.maxSize = maxSize; }</code>
<code> @Override protected boolean removeEldestEntry(Map.Entry<K,V> eldest) { return size() > maxSize; }</code>
<code>}Asynchronous processing (non‑blocking I/O, CompletableFuture, DeferredResult) and virtual threads (Java 19 preview) improve throughput for I/O‑bound workloads.
@GetMapping("/async/callable")</code>
<code>public WebAsyncTask<String> asyncCallable() {</code>
<code> Callable<String> callable = () -> "async task completed";</code>
<code> return new WebAsyncTask<>(10000, callable);</code>
<code>}</code>
<code>@GetMapping("/async/deferredresult")</code>
<code>public DeferredResult<String> asyncDeferredResult() {</code>
<code> DeferredResult<String> dr = new DeferredResult<>(10000L);
<code> dr.setResult("DeferredResult task completed");
<code> return dr;</code>
<code>}</code>
<code>Thread thread = Thread.ofVirtual().name("Virtual Threads").unstarted(runnable);Pooling (thread pools, connection pools, object pools) reduces the cost of acquiring resources repeatedly.
In summary, performance optimization spans hardware‑level considerations, JVM tuning, cache design, code‑level tricks, and architectural choices; applying the appropriate techniques based on concrete scenarios yields measurable gains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
