How We Cut Risk Engine Latency from 80 ms to 25 ms with Prefetch, Batching, Async, Compression and Bloom‑Filter Caching

Facing a traffic surge that pushed a risk‑control engine's response time beyond 250 ms, the team applied a series of systematic optimizations—feature prefetching, batch requests, asynchronous accumulator updates, multi‑level caching with Bloom filters, and log‑compression redesign—resulting in latency dropping to 25 ms, CPU and memory usage falling by up to 90%, and storage costs reduced by over a third.

Architect
Architect
Architect
How We Cut Risk Engine Latency from 80 ms to 25 ms with Prefetch, Batching, Async, Compression and Bloom‑Filter Caching

Prefetch – Feature Pre‑computation

Prefetch trades time for time by loading data ahead of request processing. In the Gaia risk‑control engine a single request traverses scene factors → rules → decision , with feature fetching accounting for >70% of the 250 ms latency. To meet a new 100 ms SLA for e‑commerce traffic, the team introduced a near‑line prefetch layer that reads downstream features from SLB streams, caches them in Redis, and serves them directly to the engine.

Cache hit rates exceed 90% and the critical e‑commerce path sees response time drop from 80 ms to 25 ms.

Traffic growth chart
Traffic growth chart

Batch – Feature Batch Retrieval

Batching reduces I/O by merging many identical downstream calls into a single request. In Gaia each risk check may query dozens of blacklist services for the same subject (mid, buvid, ip, ua). By consolidating these into one batch request and using a local cache plus singleflight, downstream call volume fell by 69% while preserving rule semantics.

Batch optimization flow
Batch optimization flow

Async – Accumulator Factor Asynchronous Calculation

Accumulator factors (e.g., count(distinct), sum) are stored in Redis and traditionally require 2–3 round‑trips per request, exhausting Redis CPU under high traffic (spider‑type workloads dominate 1.5:1). The team switched to an asynchronous pipeline built on Bilibili’s Railgun event platform: writes are aggregated in memory, de‑duplicated, and flushed to Redis in bulk.

Results:

Redis QPS reduced >35% (see Fig 9).

TP99 latency shows a modest decline (Fig 10).

Rule recall remains statistically unchanged (Fig 11).

Redis call reduction
Redis call reduction

Compression – Log Storage Optimization

Each risk request generates a log of ~11 KB (up to tens of KB). Storing raw JSON in the Taishan KV store consumes ~16 TB. The team evaluated encoding (JSON, protobuf, msgpack) and compression algorithms (gzip, xz, zstd) with and without dictionaries.

Encoding   Size   gzip   xz   zstd(no dict)
json       2255   1028   1092   1075
msgpack    1938   1088   1132   1119

zstd with a matching dictionary achieved the best per‑log compression ratio, but the overhead of dictionary training outweighed benefits for small logs. Batch compression (10–100 logs per batch) reduced overall storage by up to 60%.

Compression benchmark
Compression benchmark

Cache – Multi‑Level Cache + Bloom Filter

The blacklist service follows a classic read‑many/write‑few pattern. Initially a three‑tier Cache‑Aside stack (local → Redis → MySQL) handled <10 k QPS. When traffic surged past 100 k QPS, Redis CPU and memory peaked, and MySQL QPS hit 12 k.

Introducing a Bloom‑filter layer before the cache eliminated most cache‑penetration queries. The filter is sharded (4 × Redis slots) to avoid hot‑key hotspots, stored with a long TTL for positives and a short TTL for negatives. Construction and recovery are orchestrated via a state‑machine and Railgun scheduled tasks.

Bloom filter multi‑level cache
Bloom filter multi‑level cache

After deployment:

Metric                Before   After   Reduction
CPU usage (service)   50.5%    17.5%   65%
Redis memory          256GB    50GB    80%
Redis network I/O     174/187Mbps 13.7/6.7Mbps  ~95%
MySQL read QPS        12k      600     95%

Conclusion

Performance engineering involves trade‑offs: prefetch saves time at the cost of stale data risk; batching reduces I/O but adds complexity; async improves throughput while tolerating eventual consistency; compression saves storage but adds CPU overhead; Bloom filters cut unnecessary lookups but introduce false‑positive rates. As Brendan Gregg warns, avoid premature or excessive optimization—each technique must be justified by business‑level constraints.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBatch Processingcachingasynchronous processingbloom-filtercompressionbackend systems
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.