How We Cut Risk Engine Latency from 80 ms to 25 ms with Prefetch, Batching, Async, Compression and Bloom‑Filter Caching
Facing a traffic surge that pushed a risk‑control engine's response time beyond 250 ms, the team applied a series of systematic optimizations—feature prefetching, batch requests, asynchronous accumulator updates, multi‑level caching with Bloom filters, and log‑compression redesign—resulting in latency dropping to 25 ms, CPU and memory usage falling by up to 90%, and storage costs reduced by over a third.
Prefetch – Feature Pre‑computation
Prefetch trades time for time by loading data ahead of request processing. In the Gaia risk‑control engine a single request traverses scene factors → rules → decision , with feature fetching accounting for >70% of the 250 ms latency. To meet a new 100 ms SLA for e‑commerce traffic, the team introduced a near‑line prefetch layer that reads downstream features from SLB streams, caches them in Redis, and serves them directly to the engine.
Cache hit rates exceed 90% and the critical e‑commerce path sees response time drop from 80 ms to 25 ms.
Batch – Feature Batch Retrieval
Batching reduces I/O by merging many identical downstream calls into a single request. In Gaia each risk check may query dozens of blacklist services for the same subject (mid, buvid, ip, ua). By consolidating these into one batch request and using a local cache plus singleflight, downstream call volume fell by 69% while preserving rule semantics.
Async – Accumulator Factor Asynchronous Calculation
Accumulator factors (e.g., count(distinct), sum) are stored in Redis and traditionally require 2–3 round‑trips per request, exhausting Redis CPU under high traffic (spider‑type workloads dominate 1.5:1). The team switched to an asynchronous pipeline built on Bilibili’s Railgun event platform: writes are aggregated in memory, de‑duplicated, and flushed to Redis in bulk.
Results:
Redis QPS reduced >35% (see Fig 9).
TP99 latency shows a modest decline (Fig 10).
Rule recall remains statistically unchanged (Fig 11).
Compression – Log Storage Optimization
Each risk request generates a log of ~11 KB (up to tens of KB). Storing raw JSON in the Taishan KV store consumes ~16 TB. The team evaluated encoding (JSON, protobuf, msgpack) and compression algorithms (gzip, xz, zstd) with and without dictionaries.
Encoding Size gzip xz zstd(no dict)
json 2255 1028 1092 1075
msgpack 1938 1088 1132 1119zstd with a matching dictionary achieved the best per‑log compression ratio, but the overhead of dictionary training outweighed benefits for small logs. Batch compression (10–100 logs per batch) reduced overall storage by up to 60%.
Cache – Multi‑Level Cache + Bloom Filter
The blacklist service follows a classic read‑many/write‑few pattern. Initially a three‑tier Cache‑Aside stack (local → Redis → MySQL) handled <10 k QPS. When traffic surged past 100 k QPS, Redis CPU and memory peaked, and MySQL QPS hit 12 k.
Introducing a Bloom‑filter layer before the cache eliminated most cache‑penetration queries. The filter is sharded (4 × Redis slots) to avoid hot‑key hotspots, stored with a long TTL for positives and a short TTL for negatives. Construction and recovery are orchestrated via a state‑machine and Railgun scheduled tasks.
After deployment:
Metric Before After Reduction
CPU usage (service) 50.5% 17.5% 65%
Redis memory 256GB 50GB 80%
Redis network I/O 174/187Mbps 13.7/6.7Mbps ~95%
MySQL read QPS 12k 600 95%Conclusion
Performance engineering involves trade‑offs: prefetch saves time at the cost of stale data risk; batching reduces I/O but adds complexity; async improves throughput while tolerating eventual consistency; compression saves storage but adds CPU overhead; Bloom filters cut unnecessary lookups but introduce false‑positive rates. As Brendan Gregg warns, avoid premature or excessive optimization—each technique must be justified by business‑level constraints.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
