Backend Development 12 min read

Performance Optimization Practices for the Tongtian Tower Backend System

This article summarizes the Tongtian Tower backend performance optimization experience, detailing background, achieved improvements of 10‑30%, optimization principles, testing methods, metric analysis, and concrete strategies such as RPC scheduling, JVM tuning, logging, thread‑pool and code refinements to enhance latency and throughput.

JD Retail Technology

Jan 7, 2020

Performance Optimization Practices for the Tongtian Tower Backend System

Background : The Tongtian Tower platform experienced rapid growth in data volume and user traffic, prompting the formation of a dedicated backend optimization team to address architectural shortcomings and ensure stable operation.

Optimization Effect : Comparative measurements show a 10%–30% performance boost after applying the optimizations.

Optimization Principles : A three‑step approach was adopted, emphasizing data‑driven analysis, milestone‑based confidence building, and adherence to system‑level rules such as non‑functional impact, priority based on cost‑effectiveness, and the Occam’s razor principle.

Performance Testing : Two testing methods were used – micro‑benchmarks (e.g., JMH) for method‑level metrics and macro‑benchmarks (e.g., internal forcebot, ab, JMeter) for system‑wide metrics such as CPU, memory, I/O, JVM, latency, and throughput.

Performance Metric Analysis : Collected metrics revealed common bottlenecks like high CPU usage (heavy computation, regex backtracking, frequent GC), high heap usage (object creation, memory leaks), low JVM throughput (excessive STW), high disk/network I/O, and application‑level latency caused by serial I/O designs and upstream tail latency.

Tuning Strategies :

Design optimization: RPC scheduling was re‑engineered using a DAG‑based parallel task orchestrator (Sirector) to eliminate artificial staging and improve response times.

Single‑task RPC parallelism: Parallelized previously serial queries within a request when upstream limits allowed.

Cold‑data handling: Split hot and cold data, moving rarely accessed items out of the JVM heap and fetching them on‑demand.

Tail‑latency mitigation: Implemented hedged‑request (backup) calls with rate‑limited flow control, inspired by Jeff Dean’s “The Tail at Scale”.

Component upgrades: Updated logging to async Log4j2 with LMAX Disruptor, tuned JVM (G1 GC, disabled biased locking, aligned GC threads to container CPUs), and resized thread pools based on load‑test data.

Code refinements: Adopted efficient deep‑copy tools, eliminated unnecessary object creation, applied regex exclusive mode, and leveraged JIT inlining, lock granularity, and data‑compression techniques.

Conclusion : The optimization effort met its goals, with design‑level changes delivering the most significant gains for aggregation services, while component and code tweaks provided additional improvements validated through micro‑benchmarks. Stability and correctness remain paramount, requiring rigorous verification before any optimization is deployed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend JVM Performance Optimization load testing thread pool RPC scheduling

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.