How to Compress Large Java int/long Arrays for Massive Memory Savings
This article explains how to reduce memory usage of massive Java int/long arrays by applying real‑time compression, eliminating redundancy, using indexed buckets, offset storage, and a series of low‑level optimizations that boost TPS from dozens to over a thousand while preserving random‑access capabilities.
Data Compression
When developing backend applications or middleware, large amounts of data are often kept in memory to speed up access. As the data volume grows, storing it in the heap becomes costly, so real‑time compression of integer arrays is introduced.
1. Removing Redundancy
Instead of always using the full 4‑byte or 8‑byte representation of int or long, we can drop unused high‑order bytes dynamically. By borrowing a few bits from the original value to indicate the number of bytes used, we can store most numbers in fewer bytes.
We represent the byte count using either extra bits of the original number or the highest bit of each byte, similar to Facebook’s Thrift compression. The goal is to compress long[] or int[] into a byte[] and later reconstruct the original numbers.
2. Offset Storage
After compression, we still need random‑access by index. Because compressed numbers have variable length, we cannot compute an address directly. We solve this by building an index:
Divide the data into adjustable buckets (e.g., 1 KB or 4 KB each).
Maintain a secondary array that stores the starting index of the first element in each bucket.
During lookup, first binary‑search the bucket, then linearly scan inside the bucket.
2. Access Optimizations
1. Variable‑Length Buckets
Switching from fixed‑size to variable‑length buckets allows us to compute the bucket index in O(1) using division and remainder, replacing the previous O(log n) binary search. This change raised batch‑interface TPS from ~72 to ~320.
2. Dedicated Iterator
For sequential batch reads we create a custom iterator that records the current bucket position and advances by the known length of each element, achieving O(1) per step and raising TPS to ~680.
3. Reduce Intermediate Data – Use Stack Directly
Instead of creating temporary ByteBuffer objects, we reference the original bucket data directly and pass it via the stack to the decompression routine, cutting overhead and improving performance by ~45% (TPS ≈ 910).
4. Move Heap Data to Stack
We eliminate heap‑allocated temporary arrays by operating on primitive long values directly, using bit‑shifts and masks to assemble the value. This removes the need for byte[] num and further boosts TPS by ~35% (≈ 1250).
longData = (longData & ~ (0xffL << (2 * 8))) | (0x02L << (2 * 8));5. Inline Short Functions
Small helper methods such as updateOffset() or Byte.toUnsignedInt() are inlined, removing call overhead and contributing to the final performance gains.
3. Performance Results
After all optimizations, the batch interface TPS increased from 72 to about 1380, roughly a 20‑fold improvement. Single‑call random access reaches ~260 M TPS, and sequential single‑call access reaches ~6500 M TPS, comfortably meeting production demands.
4. Optimization Summary
Primitive data structures are far faster than object‑based ones; low‑level access wins.
Heap allocations are slow and trigger frequent Young GC; prefer stack allocations.
Method calls add stack‑frame overhead; inlining simple logic yields measurable gains.
Minimize intermediate data and temporary variables.
Even tiny inefficiencies explode under massive call volumes; batch paths must be meticulously optimized.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
