How a New Cache Design Boosted Logistics Pricing QPS Five‑Fold
This article reviews the performance challenges of the logistics pricing engine, explains the limitations of the previous 1.0 optimizations, and details the design and implementation of a new 2.0 cache strategy using Tair and local caches, key‑value modeling, pre‑heating, updates, and bottleneck mitigation, achieving the required QPS.
1. Introduction
The logistics pricing module of the solution center must handle high‑throughput calculations for dozens of countries. Initial QPS requirements were 6,600 for core 20‑country pricing and 35,600 for all 220 countries, but the cluster only reached ~1,600 QPS.
2. Problems with the 1.0 Optimization
Version 1.0 reduced CPU usage, pre‑compiled rule‑engine expressions, avoided deep object copies, and optimized logging, raising QPS to ~2,300. However, nested‑loop queries in the pricing flow still caused massive DB/Tair accesses, making QPS unstable.
3. Need for a New Cache Design
To meet the new QPS target, the cache model had to be redesigned. The principle remained “use cache, add machines”, but the design had to isolate data‑fetching from business logic and batch‑fetch high‑frequency data.
3.1 Redefining the Pricing Flow
The flow was split into a data‑retrieval layer (using Tair for large‑volume scheme/price data and local cache for stable configuration data) and a calculation layer.
3.2 New Cache Architecture
Pre‑match rate tables : when a new rate line is enabled, a listener populates a cleaned product‑line rate table so that queries can fetch all price items in one step.
Reduce DB access : configuration data (sku, resource, spu) is served from local cache; large scheme/price data is served from Tair.
Reduce network overhead : batch Tair reads (prefixGets) are performed before the core loop.
4. Tair Cache Model
Key design uses a main key (e.g., SPU) plus sub‑keys. For scheme queries the sub‑key is sku_id+version_id+destinationCountry. For price queries the sub‑key is sku_id+version_id+destinationCountry+warehouseCode. PrefixGets enable efficient batch retrieval.
Values store either a simple scheme object or a JSON‑structured price object.
5. Local Cache Model
Configuration data is aggregated by type (sku, resource, spu) into a single cache entry, reducing key count and simplifying assembly.
6. Cache Read/Write Mechanics
Pre‑heat : on service start, known keys are loaded into heap memory.
Update : a broadcast listener refreshes local cache entries in real time.
Dirty‑data handling : periodic tasks refresh local cache; Tair keys include version_id, so stale data is avoided.
7. Single‑Point Bottlenecks
Tair reached rate‑limit thresholds under heavy traffic. Mitigation strategies included capacity expansion, query reduction, hot‑key local caching, and switching to an RDB where appropriate.
8. Results
Combining local cache for configuration data with Tair for scheme/price data eliminated DB queries in the hot path. The system now supports >9,000 QPS across a 50‑node cluster, with each node handling ~180 QPS.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
