Operations 12 min read

Inside JD.com’s Product Detail Page: Architecture, Performance Analysis, and Optimization Strategies

This article examines the JD.com product detail page backend architecture, analyzes memory, thread, and CPU usage, and outlines a three‑stage optimization roadmap—including parallelization, JVM/Tomcat upgrades, thread‑pool tuning, and asynchronous refactoring—to handle massive traffic spikes.

JD Retail Technology

Dec 13, 2019

Inside JD.com’s Product Detail Page: Architecture, Performance Analysis, and Optimization Strategies

Introduction

The product detail page is a core component of JD.com’s shopping flow, handling hundreds of business scenarios and massive traffic, especially during flash‑sale events like 618 and Double 11. To meet performance demands while reducing costs, the team embarked on a systematic optimization of the backend system.

System Architecture Overview

The overall architecture consists of a client layer, a front‑end service, a middle‑office, and a back‑end. The focus of this article is the front‑end service that aggregates data from various middle‑office services.

System Components

Historically the detail‑page system is split into two services: purewaresoa (the legacy system) and warecoresoa (the newer system). Both expose JSF services and use the in‑house concurrency framework sirector for workflow orchestration.

System Analysis

Memory Analysis

Using jmap to dump the heap and MAT for analysis, the dominant memory consumers were identified as the local Zookeeper cache and the UMP statistics module.

Thread Analysis

Thread stacks were collected via jstack or JD’s internal tool jvm.jd.com. The most heavily used thread pools are jsf, sirector, jimdb, and ump. The raw thread‑pool snapshot is shown below:

{
  "taskProcessorThreadPool": 10, // queue threads
  "nioEventLoopGroup": 10, // Netty threads
  "userTracerWorker": 100, // UMP threads
  "Jst": 7,
  "BrokenConnectionDestroyer": 2,
  "SystemClock": 1,
  "pool": 331, {
    "pool-10": 200, // sirector threads
    "pool-4": 128, // jimdb connection pool
    "pool-3": 1,
    "pool-2": 1,
    "pool-1": 1
  },
  "JSF": 699,
  {
    "JSF-CLI-WORKER": 9,
    "JSF-SEV-BOSS": 1,
    "JSF-CLI-RC": 156, // client threads
    "JSF-SEV-WORKER": 2, // JSF IO pool
    "JSF-Future-Checker": 2,
    "JSF-jsfRegistry-HB&Retry": 1,
    "JSF-BZ-22000": 215, // business thread pool
    "JSF-CLI-CANDIDATE": 156, // client threads
    "JSF-CLI-HB": 155, // client threads
    "JSF-FileRegistry-Back": 1,
    "JSF-jsfRegistry-Check": 1
  },
  "UpdateProfile": 1,
  "ContainerBackgroundProcessor[StandardEngine[Catalina]]": 1,
  "CfsHeartbeat": 6, // jimdb heartbeat
  "main": 1,
  "commons": 1,
  "CLIENT_SIDE_RINGBUFFER": 1,
  "Reference Handler": 1,
  "Finalizer": 1,
  "localhost": 6, // Tomcat threads
  "ZkClient": 3,
  "I/O dispatcher 2": 1,
  "PathCache": 1,
  "IoLoopGroup ": 4,
  "I/O dispatcher 3": 1,
  "I/O dispatcher 4": 1,
  "I/O dispatcher 5": 1,
  "I/O dispatcher 6": 1,
  "I/O dispatcher 7": 1,
  "I/O dispatcher 8": 1,
  "GC Daemon": 1,
  "RMI TCP Connection(2)": 1,
  "Signal Dispatcher": 1,
  "FailoverEvent": 8,
  "RMI Scheduler(0)": 1,
  "Thread": 1,
  "System_Clock": 1,
  "NioBlockingSelector.BlockPoller": 1,
  "http": 3,
  "UMP": 11,
  "JMX server connection timeout 36": 1,
  "ClearTimeout": 1,
  "ClusterManager": 1,
  "UpdateCluster": 1,
  "ChannelEvent": 8,
  "I/O dispatcher 1": 1,
  "RMI TCP Accept": 3
}

CPU Analysis

Flame graphs generated with Arthas and traditional flame‑graph tools reveal the CPU hotspots. Two representative images are shown:

The first flame graph highlights the CPU share of each internal component, while the second ranks threads by CPU consumption.

Performance Optimization Roadmap

The optimization effort is divided into three chronological stages.

Stage 1 – Parallelization & Static/Dynamic Separation

Introduce the sirector framework to parallelize calls to over a hundred backend services.

Separate static and dynamic resources; cache static data locally, in JimDB, and on CDN.

Deploy Lua scripts for additional static caching.

Upgrade the JVM garbage collector and tune Tomcat’s NIO for asynchronous I/O.

Stage 2 – JDK/Tomcat Upgrade & Thread‑Pool & Heap Tuning

Migrate from JDK 6 to JDK 8 and Tomcat 8, yielding noticeable QPS gains.

Analyze thread‑pool metrics and resize pools (JSF, sirector, JimDB, UMP) based on observed usage.

Adjust heap size to reduce young‑generation GC (YGC) frequency.

Improve local‑cache hit rates and refine distributed‑cache strategies.

Stage 3 – Asynchronous Refactoring & Business Re‑architecture

Convert I/O, logging, and lazy‑load operations to asynchronous execution.

Refactor business logic, including upstream/downstream migration and graceful degradation.

Implement recommendations, ranking list async updates, price‑promotion merging, and shared middle‑office cache for main images.

Tools Used

Both internal and open‑source tools were employed:

Internal: forcebot , mdc , ump , visual

Open‑source/Linux: standard Linux performance utilities (e.g., top, perf) and JDK diagnostics ( jps, jstat, jmap, jstack).

Thread‑Pool Specific Findings

During sirector thread‑pool tuning, a spike in CPU was observed at the start of load testing because the pool’s core size differed from its maximum size, causing costly thread creation. Fixing the pool to a fixed size eliminated the spike.

Flame‑Graph Insights

Flame graphs proved invaluable for pinpointing CPU hotspots at the line‑level. Two examples illustrate the findings: (1) Gson deserialization consumes significant CPU, and (2) exception handling incurs notable CPU overhead due to exception table processing.

Conclusion

Performance tuning is an ongoing journey. As traffic grows and business evolves, new bottlenecks will emerge. Continuous learning of low‑level system behavior, systematic profiling, and iterative adjustments are essential to sustain high‑throughput, low‑latency service for JD.com’s product detail pages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Java Performance Optimization thread pool CPU profiling

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.