How Ctrip Optimized Hotel Cache Memory: From HashMap Overhead to Bitmap Encoding
This article examines Ctrip's hotel query service cache, detailing the selection of in‑memory data structures, analyzing Java object and HashMap memory overhead, and demonstrating how bitmap, run‑length, dictionary, and delta encodings dramatically reduce memory usage in real‑world scenarios.
Introduction
Ctrip's hotel query service is the core backend for the hotel BU, handling billions of dynamic data items and caching them in local memory and Redis to meet sub‑second update latency while ensuring eventual consistency. The article explores how the technical team chose and optimized in‑memory data structures to improve read efficiency and reduce memory consumption.
Memory Structure Selection
The team first identified two fundamental requirements for the cache: high‑performance reads (thousands of lookups per request) and support for high update frequency (price updates must propagate within seconds). Thread‑unsafe structures like HashMap were ruled out due to concurrency risks, leading to the selection of thread‑safe map implementations as the baseline.
Java Object Memory Model
A Java object consists of an object header, instance data, and alignment padding. The header stores a Mark Word (lock state, GC flags) and a Class Pointer; on a 64‑bit JVM with compressed oops, the header occupies 8 bytes. Instance fields are laid out tightly, and padding ensures the total size is a multiple of 8 bytes. For example, an object with an int and a byte field occupies 24 bytes.
HashMap Memory Overhead
Using HashMap<Integer,Integer> as an example, the bucket array ( table) stores references to Node objects. Each bucket array element adds 4 bytes (reference) plus the array header (8 bytes mark word + 4 bytes class pointer + 4 bytes length). A 32‑slot array therefore consumes 144 bytes. The bucket array is often oversized (capacity is the next power‑of‑two) to reduce collisions, leading to 55 %+ of total memory being structural overhead.
Each Node stores key, value, hash, and a next pointer, costing roughly 32 bytes. Storing 32 integer pairs thus requires about 1 KB for nodes alone, on top of the bucket array.
Packaging Type Overhead
Java generics force the use of wrapper classes. An Integer occupies 4 bytes for the primitive value plus 12 bytes for its object header, plus a 4‑byte reference in the array—totaling about 20 bytes per element, compared with 4 bytes for a plain int. This 16‑byte overhead per element becomes significant at large scales.
Alternative Structures
ConcurrentHashMap : Thread‑safe version of HashMap with finer‑grained locking.
SparseArray : Android’s memory‑efficient map for int keys.
Guava Cache : Google’s local cache with segment‑level locks.
fastutil : Primitive‑type collections that avoid boxing.
Data Encoding Compression
Beyond structure selection, the team applied several encoding techniques to further shrink memory.
1) Bitmap Encoding
Using BitSet to store boolean or enumerated fields reduces storage from 8 bytes per entry to a single bit. For eight keys, memory drops from 64 bytes to 1 byte.
2) Run‑Length Encoding (RLE)
Consecutive identical values are stored as a value plus a repeat count, e.g., a4b6 compresses a 10‑byte string to 4 bytes.
3) Dictionary Encoding
Repeated values are stored once in a dictionary; original entries hold a reference (pointer) to the dictionary entry. This is useful when many objects share identical fields.
4) Delta Encoding
For near‑continuous keys (e.g., dates), store the offset from a base value, turning a Date key into a small int offset, enabling array‑based storage.
Application Cases
1) Room Basic Information
Over a hundred GB of memory was used to cache billions of room records. By applying bitmap encoding to boolean/enum fields and dictionary encoding to highly duplicated room entities (99 % duplication), memory usage dropped to under 2 % of the original size, saving tens of gigabytes.
2) Daily Price Information
The daily price cache stores per‑room, per‑date price data, the largest and most critical cache. The optimization pipeline:
Dictionary‑encode distinct price values; store an index array referencing the price dictionary, with a sentinel null entry for missing dates.
Delta‑encode dates to offsets from the service start date, allowing an int[] to represent the date dimension.
Bitmap‑encode the price‑index array because the number of distinct prices per room is small (e.g., three), reducing each index to 2 bits.
Apply RLE to the tail of the series where prices repeat for far‑future dates.
These steps shrink a single price object from several hundred bytes to ~31 bytes, achieving roughly 60 % compression in production.
Conclusion
The study demonstrates that careful in‑memory data‑structure selection, combined with bitmap, dictionary, delta, and run‑length encodings, can reduce cache memory consumption by orders of magnitude while preserving sub‑second read latency. Such techniques enable large‑scale services to cache billions of records on modest hardware.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
