Big Data 11 min read

How to Store Billions of Keys in Redis: Cut Memory, Reduce Fragmentation, and Scale Real‑Time DMP

This article examines the challenges of storing massive DMP data in Redis, analyzes memory fragmentation, key‑size issues, and latency constraints, and presents practical strategies such as TTL eviction, bucket‑hashing, custom key compression, and fragmentation‑reduction techniques to enable scalable, real‑time querying.

Efficient Ops

May 11, 2021

How to Store Billions of Keys in Redis: Cut Memory, Reduce Fragmentation, and Scale Real‑Time DMP

1. Demand Background

The scenario involves a Data Management Platform (DMP) that must cache massive third‑party ID data, including media cookies, internal IDs (supperid), demographic tags, mobile IDs (IDFA, IMEI), blacklists, and IPs. While offline storage of billions of records on HDFS is straightforward, the DMP requires millisecond‑level real‑time queries. Frequent generation of new cookies due to their instability demands immediate synchronization of mapping data to achieve accurate demographic tagging, making cache storage extremely challenging.

2. Data to Store

Population tags consist of cookie, IMEI, IDFA linked to gender, age, and geo codes. Mapping data links media cookies to supperid. Example structures:

PC side ID:

supperid => { age=>age_code, gender=>gender_code, geo=>geo_code }

Device side ID:

imei or idfa => { age=>age_code, gender=>gender_code, geo=>geo_code }

PC data requires two key‑value forms (key=>value and key=>hashmap), while device data only needs the hashmap form.

3. Data Characteristics

Short keys and values: supperid is a 21‑digit number, IMEI is a lowercase MD5, IDFA is an uppercase MD5 with hyphens.

Scale: supperid reaches hundreds of billions, media mappings tens of billions, mobile IDs tens of billions.

Daily generation of billions of new mapping relationships.

Some stable cookies can be pre‑warmed, but most new IDs are unpredictable.

4. Technical Challenges

Variable key lengths cause memory fragmentation.

Heavy pointer usage inflates memory consumption up to 7×, a common issue for pure in‑memory storage.

Despite possible hot‑data prediction, a large proportion of IDs are newly generated each day.

Public‑network latency requirement (<60 ms) and overall response time (<100 ms) force all new mapping and demographic data to stay in memory.

Business rules require retaining data for at least 35 days.

Memory cost is high; storing billions of keys demands a massive solution.

5. Solutions

5.1 Eviction Strategy

Because new data continuously arrives, timely eviction of cold data is essential. The approach aggregates IDs in HBase, deduplicates them, and sets a TTL of 35 days. In Redis, keys are given a 35‑day expiration; each access renews the TTL, effectively keeping hot IDs while automatically discarding stale ones.

5.2 Reducing Memory Inflation

Hash table size and key count determine collision rate and memory usage. To cut the number of Redis keys, we hash each original key with MD5 to obtain a fixed‑length bucket identifier (BucketId) and store the actual key‑value pair inside a hashmap under that bucket. If, on average, ten keys share one BucketId, the total number of Redis keys can be reduced by over 90 %.

Implementation details:

public static byte[] getBucketId(byte[] key, Integer bit) {
    MessageDigest mdInst = MessageDigest.getInstance("MD5");
    mdInst.update(key);
    byte[] md = mdInst.digest();
    byte[] r = new byte[(bit-1)/7 + 1]; // only 7 bits per byte are usable for ASCII
    int a = (int) Math.pow(2, bit%7) - 2;
    r[r.length-1] = (byte) (md[r.length-1] & a);
    System.arraycopy(md, 0, r, 0, r.length);
    for (int i = 0; i < r.length; i++) {
        if (r[i] < 0) r[i] &= 127;
    }
    return r;
}

The bit parameter determines the bucket space size (powers of two). For a target of 1 billion buckets (≈2³⁰), each bucket can hold about ten KV pairs, reducing the effective key count to the order of 10⁸.

5.3 Reducing Fragmentation

Fragmentation arises from misaligned memory and deletions that leave gaps. By storing fixed‑length keys (e.g., truncating cookies or device IDs to the last six characters) and only three‑byte codes for age, gender, and geo, memory alignment improves and fragmentation drops. Additionally, restarting a Redis slave and forcing a failover can compact the master’s memory.

For allocator optimization, tools like Google‑tcmalloc or Facebook‑jemalloc are recommended; they can significantly reduce fragmentation when values are small.

Source: juejin.cn/post/6956147115286822948

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Memory optimization Redis TTL Hashing DMP large-scale storage BucketId

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.