How to Boost Inventory Reservation Performance by 2300% with Redis Caching
This article details a comprehensive performance‑testing case study of an inventory‑reservation system, describing the test scenarios, bottleneck analysis, Redis‑based cache redesign, re‑testing results, cache‑preheat strategies, invalid‑call reduction, and operational best practices that together increased TPS by over twenty‑three‑fold while keeping latency under 500 ms.
1. Introduction
Performance testing is essential for ensuring a software system's business capacity and stability. It influences system capability design and testing strategy, prompting considerations during architecture analysis, traffic analysis, load‑test execution, and tuning.
2. Hot Data Storage Model Load Test and Reflections
Through load testing, we inferred SKU inventory pre‑allocation performance bottlenecks under different storage modes. After data‑architecture upgrades, SKU pre‑allocation throughput (TPS) increased by 2300%.
Testing drove the need for cache warm‑up, using big‑data analysis to devise scientific cache pre‑heat and retention strategies, and inspired more efficient test‑data construction methods.
2.1 Load‑Test Scenario
Inventory pre‑allocation reserves SKU stock briefly during order acceptance. The inventory center provides a pre‑allocation API backed by three key applications: inventory‑deduction logic, cache layer, and task scheduling.
Two implementation models exist:
Department‑level pre‑allocation via Redis cache.
Batch pre‑allocation directly on the database.
During peak promotions, hotspot SKU requests surge, causing TP99 spikes and potential order‑acceptance timeouts.
2.2 First Load Test and Analysis
Goal: Determine the peak traffic the pre‑allocation API can sustain with database‑backed hotspot SKU requests, targeting TP99 ≤3000 ms and verifying improvements to achieve TP99 ≤500 ms.
Method: Continuously press a single hotspot SKU, starting at QPS = 10 and increasing by 10 each step.
Findings:
At QPS = 50, the system remains stable (TP99≈100 ms).
CPU/Memory usage stays low for both the pre‑allocation and deduction services.
Database CPU usage ≤7.8% with no slow SQL.
Increasing to QPS = 60 caused TP99 to jump to 7000 ms, indicating a bottleneck.
2.3 Optimization and Re‑Testing
Storage layer redesign moved batch pre‑allocation from the database to Redis, leveraging Redis’s high throughput to handle hotspot SKU traffic.
Consistency is ensured by asynchronous write‑back to the database after cache hits, and by reading from the database first during cache miss or breakdown.
Re‑test results:
Starting QPS = 1100, TPS up to 1200 yielded TP99≈130 ms – the system comfortably supports batch pre‑allocation.
At TPS = 1300, TP99 spikes to ~420 ms and cache service CPU exceeds 90%, so pressure is stopped.
Compared with the database‑only mode, cache‑based redesign meets the ≤500 ms TP99 target and improves TPS capacity by 2300%.
2.4 System Robustness Considerations
Full‑cache approaches waste resources for short‑lived SKU categories; a selective cache strategy is needed.
Cache warm‑up and retention are critical: during major promotions, cache keys expire after 7 days, leading to cache misses. Extending key lifetimes or dynamically extending hot‑SKU keys can improve hit rates.
2.5 Test‑Strategy Improvements
Live‑stream e‑commerce introduces bursty traffic with low SKU overlap, requiring more diverse SKU selection in load tests.
Automation of data preparation and request generation can accelerate complex scenario construction and improve testing efficiency.
3. Invalid Call Analysis, Identification, and Optimization
During traffic analysis, we identified excessive calls to the order‑package‑detail query interface, especially from the “access‑return” service, leading to a 60% reduction after logic adjustments and AB‑test alias correction.
3.1 Background
The logistics system provides order and package details to external systems. Two major callers generate most traffic.
3.2 Diagnosis and Optimization
We prevented calls when orders were not yet shipped and removed unnecessary queries based on return content. AB‑test environment alias errors were also fixed.
3.3 Results
After optimization, total calls from the access‑return service dropped 60% (from 2.397 billion to 0.926 billion) and peak calls fell 64%.
3.4 Proactive Risk Identification
Early traffic analysis can uncover performance risks before load‑test execution, reducing mitigation cost.
3.5 Ops Review Practices
Continuous monitoring of traffic anomalies, coding standards for API calls, and regular audits of custom logic help prevent invalid queries.
3.6 Future Optimization Space
Further analysis of “I” and “P” scenarios reveals additional non‑standard processes that can be streamlined.
4. Conclusion
Performance testing is a key measure for strengthening system capabilities. By presenting typical cases and reflections, we explored ways to enhance system capacity and testing strategies, ensuring core links can stably handle peak business traffic and extreme scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JD Cloud Developers
JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
