Backend Development 15 min read

Design and Performance Optimization of LruCache in Meituan DSP System

Meituan’s DSP system boosted high‑QPS ad serving performance by layering an LRU cache in front of Redis, then adding time‑based eviction, sharding the cache into HashLruCache instances to cut lock contention, and employing a zero‑copy, reference‑counted design, ultimately cutting average latency to about 20 % of the original and similarly reducing 99.9th‑percentile delays.

Meituan Technology Team

Dec 20, 2018

Design and Performance Optimization of LruCache in Meituan DSP System

Background: The Meituan DSP (Demand‑Side Platform) system processes massive advertising traffic with high concurrency and low latency requirements (average response in the order of hundreds of milliseconds). To improve performance, Meituan introduced an LruCache (Least Recently Used Cache) with an eviction mechanism, combining it with a key‑value store (Redis) to keep frequently accessed ad data locally.

LruCache Overview: LruCache implements the classic LRU algorithm. Internally it maintains a doubly‑linked list ordered by recent usage and a hash map for O(1) key lookup. When the cache reaches its preset capacity, the least‑recently used entry (tail of the list) is evicted. Read operations move the accessed node to the head; write operations insert new nodes at the head and update the hash map. Thread safety is ensured by protecting all read/write sections with a lock.

Application in Meituan DSP: Advertising data is originally stored in Redis. Directly querying Redis for each request incurs high latency, especially as QPS grows. By adding LruCache in front of Redis, the system reduces average fetch time and keeps memory usage within safe limits. The cache size is tuned according to the host’s memory capacity.

Evolution Stages

1. Introducing LruCache

At low QPS, direct Redis access meets requirements, but latency rises sharply when QPS exceeds 250. Adding LruCache drops average latency by an order of magnitude, keeping it below 2 ms for QPS ≤ 500, and improves the 99.9th‑percentile (Top999) latency.

2. Time‑Based Eviction

To handle stale data when the underlying Redis entries change, a time‑based eviction policy was added. Each cache entry stores a timestamp; if the elapsed time exceeds a configured TTL, the entry is invalidated and refreshed from Redis. This approach limits full‑cache clean‑up overhead while keeping data freshness.

3. HashLruCache for High QPS

Lock contention became a bottleneck at higher QPS. HashLruCache shards the cache into N independent LruCache instances using a simple modulo hash on the key. This reduces the critical section size and allows parallel reads/writes. Experiments with 16 shards showed nearly a 50 % reduction in average latency and a three‑fold improvement in Top999 latency.

4. Zero‑Copy Mechanism

When cached objects are complex, copying them for each request becomes expensive. A zero‑copy design returns a pointer to the cached data instead of a full copy. To avoid use‑after‑free bugs, each cached object carries an atomic reference count: the count is incremented on pointer acquisition and decremented on release. When the count reaches zero and the object’s TTL has expired, it can be safely reclaimed. This change reduced average latency by over 60 % and Top999 latency by nearly 50 %.

Summary

Across all optimization stages—basic LruCache, time‑based eviction, HashLruCache sharding, and zero‑copy—the system achieved a reduction of average latency to roughly 20 % of the original baseline and a similar reduction in the 99.9th‑percentile latency. These techniques demonstrate how a simple cache structure can be progressively refined to meet the stringent performance demands of a high‑QPS advertising platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance tuning Zero Copy high QPS cache optimization LRUCache HashLruCache Meituan DSP

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.