How Ctrip Optimized Hotel Cache Memory: From HashMap Overhead to Advanced Encoding
Facing billions of cached hotel records, Ctrip’s backend team analyzed Java object layout and HashMap memory costs, evaluated alternative structures, and applied bitmap, dictionary, run‑length and delta encodings to compress data, ultimately reducing cache size to under 2% of its original footprint.
Introduction
Ctrip’s hotel query service is the core backend for the hotel business unit, providing a unified API for dynamic hotel data. To meet strict response‑time requirements while caching billions of records in local memory and Redis, the team needed to choose and optimise the in‑memory data structures.
Memory Structure Selection
The service must support high‑frequency reads (thousands of accesses per request) and rapid, concurrent updates (price changes every second). Simple arrays and hash tables offer O(1) lookups, while tree‑based structures have higher complexity and are unsuitable for the scale.
Java Object Memory Model
A Java object consists of an object header, instance data, and alignment padding. The header stores a mark word and a class pointer (4 bytes on 32‑bit, 8 bytes on 64‑bit; compressed OOPs reduce the class pointer to 4 bytes). Arrays add an extra 4‑byte length field. Alignment pads the total size to a multiple of 8 bytes.
For example, a simple object with an int and a byte occupies 24 bytes on a 64‑bit JVM with compressed OOPs.
HashMap Overhead
Using HashMap<Integer,Integer> as a cache example, experiments on a 64‑bit machine with compressed OOPs showed that the internal structure always consumes more than 55 % of total memory, regardless of data volume. The hash‑bucket array stores a reference to each Node (key, value, hash, next). Its size is calculated as:
8 (mark word) + 4 (class pointer) + 4 (array length) + 4 × N (references) + paddingFor N = 32, the bucket array occupies 144 bytes. HashMap also expands its capacity to the next power of two, leading to significant unused slots.
Alternative Structures
To reduce memory while keeping concurrency, the team evaluated several third‑party collections:
ConcurrentHashMap – thread‑safe version of HashMap with finer‑grained locking.
SparseArray – Android’s memory‑efficient replacement for HashMap<Integer,E>.
Guava Cache – Google’s local cache library using segmented locks.
fastutil – Provides primitive‑type collections that avoid boxing overhead.
Benchmarks (shown in the original tables) demonstrated that fastutil’s primitive maps and SparseArray consume far less memory than standard HashMap for the same key count.
Data Encoding & Compression Techniques
Beyond structure selection, the team applied several encoding methods to further shrink the cache.
Bitmap (BitSet) Encoding
Bitmaps store boolean states as single bits. For dense integer keys, a BitSet reduces storage from 64 bytes (4 bytes key + 4 bytes boolean per entry) to 1 byte for eight entries.
Run‑Length Encoding (RLE)
RLE replaces consecutive identical values with a value‑count pair, e.g., a4b6 for aaaa bbbbbb, cutting the length from 10 bytes to 4 bytes.
Dictionary Encoding
Repeated strings are stored once in a dictionary; original entries keep only a reference (pointer) to the dictionary entry.
Delta (Difference) Encoding
For near‑continuous keys (e.g., dates), the key is stored as the offset from a base value, allowing the use of compact int[] arrays.
Case Studies
Room Basic Information
The cache holds billions of room‑type records, each with many fields. Two optimisations were applied:
Bitmap encoding for enumerable fields (booleans, enums). A field like RoomType with only five possible values was reduced from 16 bytes to 3 bits.
Dictionary encoding for duplicated room entities. Over 99 % of rooms share identical basic info; after serialising each entity to an MD5 hash, only the hash is stored in the primary map, with a secondary dictionary mapping hash → full entity.
These steps reduced the room‑type cache to less than 2 % of its original size, saving tens of gigabytes.
Daily Price Information
Daily price data is the largest and most critical cache. The optimisation pipeline:
Dictionary encode repeated price values; store an index to a price array, with a sentinel null at index 0.
Delta encode dates to offsets from the server start date, turning date keys into a dense int range.
Bitmap encode the price‑index array because the number of distinct prices per room is small (e.g., three prices → 2 bits per day).
RLE compress trailing repeated prices at the far future dates.
After compression, a single room‑day price object shrank from several hundred bytes to ~31 bytes, achieving roughly 60 % compression in production.
Conclusion
The article demonstrates that by analysing Java object layout, measuring HashMap overhead, selecting appropriate thread‑safe collections, and combining bitmap, dictionary, delta, and RLE encodings, massive in‑memory caches can be reduced to a fraction of their original size while preserving read‑write performance.
enum Season { Spring, Summer, Fall, Winter; }Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
