Memory Optimization in Redis Using zipList and Hash Bucketing
This article explains how to dramatically reduce Redis memory consumption—by up to 90%—through converting long string keys to integers, leveraging zipList‑encoded hashes, and distributing millions of key‑value pairs across many hash buckets while maintaining query performance.
Redis, as the most popular NoSQL cache database, is chosen in most scenarios for its excellent performance and rich data structures.
Because Redis stores data purely in memory, large data volumes can consume a lot of memory; selecting appropriate data structures can reduce memory usage by 80%–99%.
In a DSP advertising system or massive‑user platform, a common requirement is to quickly map a unique identifier (e.g., MD5 of MAC address, UUID, or phone number) to a user ID. The data volume can reach tens of millions or billions, and the keys are long strings such as 32‑byte MD5 values.
In a simple test, inserting 10 million key‑value pairs (MD5 → numeric ID) occupies about 1.17 GB of Redis memory; scaling to 100 million pairs consumes roughly 8 GB.
When the same data set is stored using zipList‑encoded hashes, memory drops to 123 MB, an 85% reduction. The article then analyzes Redis’s underlying storage mechanisms.
Redis strings are implemented with a custom "simple dynamic string" (SDS) type and have three internal encodings: int (fixed 8‑byte integer), embstr (dynamic, doubles size each expansion, caps at 1 MB), and raw (used for strings longer than 44 bytes). In the example, the key is a 32‑byte string (stored as embstr ) and the value is a long integer (stored as int ). Converting the 32‑byte MD5 string to an integer reduces key storage by about three‑quarters—this is the first optimization point.
Redis hashes can be encoded as either hashTable (similar to Java’s HashMap) or zipList (a compressed list). A zipList stores all field‑value pairs sequentially in a single long string, which uses far less memory than a hash table but requires more CPU for lookups.
The rule for using zipList is: if the number of fields in a hash does not exceed 512 and each field’s value is ≤64 bytes, Redis will automatically use zipList encoding. This constitutes the second optimization point.
To transform the original 10 million key‑value pairs, the article proposes four steps:
Distribute the pairs into N buckets, each bucket being a Redis hash that holds no more than the default 512 elements (e.g., allocate 25 000–30 000 buckets for 10 million keys).
Map each MD5 key to a bucket using a fast, balanced hash function such as CRC32; the bucket index is obtained by crc32(md5) % bucketCount .
For the inner field, apply another hash (e.g., BKDR or Java hashCode ) to avoid collisions, producing a long integer field value while keeping the original numeric value unchanged.
Load the data: the original md5 → id becomes bucketKey (hash) → field (hashed md5) → value (id) .
Performance testing shows that querying 1 million entries using the original key‑value layout takes about 10.6–11.3 seconds, while the hash‑field layout adds less than 0.5 seconds, but memory usage drops from 1.1 GB to 120 MB—a near‑90% saving.
Key takeaways:
Massive key‑value pairs consume excessive memory; keeping keys uniform in length (e.g., using 8‑byte integers) helps reduce fragmentation.
Converting 32‑byte MD5 strings to 8‑byte integers dramatically cuts key size.
zipList encoding provides substantial memory savings with minimal performance impact, provided each hash contains ≤512 fields.
Compressing string or object values to byte[] (e.g., using Google Snappy) further reduces memory.
Avoid operations that cause repeated reallocations (such as APPEND or SETRANGE ); use SET to replace values directly.
Drawbacks of this approach include the inability to set individual TTLs on hash fields and a small probability of hash collisions, which may be unacceptable for highly precise data requirements.
The author has implemented a CompressRedisTemplate that extends Spring Boot’s RedisTemplate to automatically convert key‑value pairs into hash storage.
Future work will explore more extreme scenarios, such as unique visitor counting, to demonstrate how Redis’s less‑common data structures can shrink memory usage from 20 GB to 5 MB.
For code examples, contact the author at [email protected].
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.