Scaling JD’s Seckill Product Pool: Architecture, GC Tuning & Caching
This article details how JD’s Seckill platform expanded its product pool by up to tenfold through architectural redesign, JVM garbage‑collection tuning, dual‑cache updates, local LRU caching, and Bloom filter integration, achieving significant performance and stability gains for large‑scale flash‑sale events.
Background
JD’s Seckill channel, the company’s largest marketing venue, has experienced rapid growth in both product count and user traffic. Anticipating a 5‑10× increase in product volume, the existing architecture faced scalability challenges.
The system originally cached the entire product pool in memory using JIMDB (a Redis‑like distributed cache) and relied on ZooKeeper notifications for real‑time updates.
Problem Analysis
During major sales events, the product pool surged, causing the JVM heap to grow quickly. Minor GC could not reclaim the newly allocated space, leading to regular spikes in heap usage and frequent Full GC cycles that heavily impacted CPU and API latency.
Heap‑dump analysis (using jmap -histo) revealed massive temporary String objects generated during full‑coverage updates, especially for category‑seckill items, pushing the old generation memory to the limit.
Root causes identified:
Large objects (e.g., long strings) directly promoted to the old generation.
Objects surviving multiple Minor GCs crossed the -XX:MaxTenuringThreshold and were promoted early.
Dynamic object‑age determination caused objects with age ≤2 to be promoted due to Survivor space pressure.
These factors caused rapid old‑generation growth and frequent Full GC.
Optimization Solutions
1. Dual‑Cache Timed Hash Updates
Instead of full‑coverage updates, products are hashed by SKU into buckets. Updates are applied at bucket granularity, and a dual‑cache mechanism delays actual data refresh until a timed switch, reducing update frequency and memory churn.
2. Introduction of Local LRU Cache
The architecture was split into separate services for core channel functions and product tagging, allowing independent scaling. The tagging service now uses a combination of JIMDB full‑cache and a local LRU cache (Caffeine) to evict cold data, limiting in‑memory product count.
Caffeine was chosen over Guava for its superior read/write performance and its W‑TinyLFU algorithm, which offers higher hit rates with lower memory overhead.
3. Bloom Filter for Invalid SKU Requests
A Bloom filter replaces a full Set of valid SKUs to quickly reject non‑seckill SKU queries, preventing cache‑penetration attacks and reducing unnecessary JIMDB lookups.
Optimization Effects
After the upgrades, extensive load testing and gradual rollout confirmed:
Support for horizontal scaling of the product pool, enabling future growth.
Interface 99.9th‑percentile latency improved by ~90% during the 618 promotion, eliminating performance spikes.
Full GC frequency dropped dramatically, enhancing overall system stability.
Performance graphs (Figures 12‑13) illustrate the marked improvements in throughput and GC behavior.
Conclusion
The seckill product‑pool expansion project achieved its goals by redesigning update granularity, separating services, adopting a high‑performance local cache, and adding a Bloom filter, resulting in a scalable, stable architecture ready for future traffic spikes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
