Backend Development 14 min read

Investigation of Sudden Performance Degradation in JD.com Calendar Service: Spring MimeTypeUtils LRU Cache Bug and JDK ConcurrentLinkedQueue Issue

The article details a systematic investigation of a sudden performance drop in JD.com's calendar SOA service, revealing a Spring MimeTypeUtils LRU‑cache bug and a ConcurrentLinkedQueue removal bug in the JDK, and explains how upgrading Spring and applying JDK fixes restored service stability.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Investigation of Sudden Performance Degradation in JD.com Calendar Service: Spring MimeTypeUtils LRU Cache Bug and JDK ConcurrentLinkedQueue Issue

To ensure stable front‑end service performance, JD.com Retail Platform built an online traffic recording and replay system for regular load‑testing. During a routine test on March 31, the calendar module’s SOA service showed CPU usage over 90%, QPS drop, and reduced availability.

Initial checks found no deployment changes or external anomalies, leading the team to treat the issue as an occasional glitch. However, a manual replay on April 1 (April Fool's Day) caused 100% failure, indicating a persistent problem isolated to the load‑testing machine.

Monitoring Investigation : Four service interfaces (A‑D) were examined. Interfaces A and B were simple configuration reads and were excluded. Interfaces C and D called upstream services, so upstream latency was suspected. UMP monitoring showed 100% availability and a 236 ms TP999, contradicting the observed HTTP failures.

The HTTP error code 524 (TCP handshake completed but no response) suggested a server‑side issue. The team suspected Tomcat or Spring MVC and used UMP‑Pfinder link tracing, which revealed that while the JSF call took only ~20 ms, Tomcat spent ~8 s processing the request.

JVM Analysis : JVM metrics (GC, heap, CPU) were normal (CPU ~75%). However, thread count spiked from <200 to >1000. A jstack dump showed 1043 threads in WAITING state, most belonging to the http‑nio‑1601‑exec‑* pool.

Further inspection identified that these threads were blocked on org.springframework.util.MimeTypeUtils$ConcurrentLruCache.get due to a lock contention issue.

Spring Framework Bug : The LRU cache in MimeTypeUtils uses a ConcurrentLinkedQueue . Multiple GitHub issues reported severe performance degradation and CPU spikes caused by this cache. The bug was fixed in newer Spring versions by replacing the queue with a ConcurrentLinkedDeque .

JDK Bug : The underlying problem stemmed from a known bug in ConcurrentLinkedQueue.remove(Object) (JDK‑8137184) where removing the last element fails to unlink the node, causing the list to grow indefinitely. OpenJDK 8u102 later fixed this.

Relevant source snippet (original buggy method):

// ConcurrentLinkedQueue buggy remove method
public boolean remove(Object o) {
    if (o == null) return false;
    Node
pred = null;
    for (Node
p = first(); p != null; p = succ(p)) {
        E item = p.item;
        if (item != null && o.equals(item) && p.casItem(item, null)) {
            Node
next = succ(p);
            if (pred != null && next != null)
                pred.casNext(p, next);
            return true;
        }
        pred = p;
    }
    return false;
}

The bug caused the LRU cache list to retain ~740,000 nodes, leading to memory leakage and thread‑waiting spikes.

After upgrading Spring to a version with ConcurrentLinkedDeque and applying the JDK fix, the service returned to normal. The post concludes with lessons on preserving problem snapshots, avoiding blind restarts, and leveraging tracing tools.

References include Spring Framework issue #24886, JDK bugs 8137184 and 8150780, PerfMa thread analysis tool, and Eclipse MAT for heap analysis.

debuggingJavaperformanceConcurrencySpringJDK
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.