How to Store Billions of IDs in Redis Efficiently: Strategies for Massive DMP Caches
This article examines the challenges of storing and querying billions of DMP identifiers in Redis, analyzes data characteristics and memory fragmentation issues, and presents practical solutions such as eviction policies, bucket‑based key hashing, and fragmentation reduction techniques to achieve low‑latency, large‑scale caching.
The author discusses the problem of real‑time data warehousing for a DMP that must cache massive numbers of third‑party IDs (media cookies, own cookies, demographic tags, mobile IDs, blacklists, etc.) and provide millisecond‑level queries.
Storage Data Types
Population tags include cookie, IMEI, IDFA and their associated gender, age, and geo codes; mapping relationships link media cookies to a unified supperid. Example storage formats:
PC IDs:
media-id-media-cookie=>supperid
supperid=>{age=>age_code, gender=>gender_code, geo=>geo_code}Device IDs:
imei or idfa=>{age=>age_code, gender=>gender_code, geo=>geo_code}PC data requires two key‑value patterns (key=>value and key=>hashmap), while device data can be stored as a single key=>hashmap.
Data Characteristics
Short keys and short values: e.g., 21‑digit supperid, MD5‑hashed IMEI, uppercase MD5 with hyphens for IDFA.
Media cookies vary in length.
Full‑scale data: supperid reaches hundreds of billions, media mappings tens of billions, mobile IDs tens of billions.
Billions of new mapping relationships are generated daily.
Some stable cookies can be pre‑warmed for hot data.
Many newly generated cookies cannot be predicted, increasing cache pressure.
Technical Challenges
1) Variable key lengths cause memory fragmentation.
2) Extensive pointer usage leads to high memory expansion (≈7×).
3) Daily influx of new IDs makes hot‑data prediction difficult.
4) Service must respond within 100 ms over public networks, requiring all new mappings and tags to stay in memory.
5) Business rules demand data retention of at least 35 days.
6) Memory cost is high; a hundred‑billion‑key solution is essential.
Solutions
Eviction Strategy
Because new data continuously enters the store, timely eviction of cold data is crucial. The approach aggregates logs in HBase, defines a 35‑day TTL, and sets a 35‑day expiration in Redis; accessed keys are refreshed, effectively keeping hot IDs while discarding stale ones.
Reduce Expansion
To lower memory bloat, keys are hashed into fixed‑length bucket IDs using MD5. The bucket ID (called BucketId) serves as the Redis key, and the original key‑value pair is stored inside a hashmap under that bucket. By allowing many keys to collide within a bucket (e.g., 10 keys per bucket), the total number of Redis keys can be reduced by over 90%.
The implementation computes a suitable hash length (e.g., 33 bits for hundred‑billion keys) and uses the full ASCII range (0‑127) instead of hex strings, halving key length.
public static byte[] getBucketId(byte[] key, Integer bit) {
MessageDigest mdInst = MessageDigest.getInstance("MD5");
mdInst.update(key);
byte[] md = mdInst.digest();
byte[] r = new byte[(bit-1)/7 + 1]; // only 7 bits per byte are usable for ASCII
int a = (int) Math.pow(2, bit % 7) - 2;
md[r.length-1] = (byte)(md[r.length-1] & a);
System.arraycopy(md, 0, r, 0, r.length);
for (int i = 0; i < r.length; i++) {
if (r[i] < 0) r[i] &= 127;
}
return r;
}The bit parameter determines bucket space size; for 10 keys per bucket, 2^30 buckets (≈1 billion) suffice for a hundred‑billion‑key store.
Reduce Fragmentation
Memory fragmentation is mitigated by using equal‑length keys and aligning them. The solution truncates the last six characters of cookies or device IDs as the hashmap key, ensuring low collision probability. Values store only three bytes (age, gender, geo codes). Additionally, restarting Redis slaves and performing a forced failover can compact memory.
For allocator-level fragmentation reduction, the author recommends Google tcmalloc or Facebook jemalloc, which can significantly lower memory consumption for small values.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Interview Crash Guide
Dedicated to sharing Java interview Q&A; follow and reply "java" to receive a free premium Java interview guide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
