Compressing User Tags and Models with Protostuff and Gzip
By serializing user feature data with Java's Protostuff (built on Protobuf) and then applying JDK Gzip compression before storing it in Redis, the author shrank typical 70 KB per‑user payloads to under 10 KB, enabling billions of records with cross‑language compatibility and no schema‑breakage.
Recently while working on the algorithm engineering side, the author found that user‑related features (offline features, real‑time exposures, clicks, etc.) are large, and storing each user in Redis consumes 50‑70 KB or more.
To reduce memory usage, the author explored serialization and compression tools. Having previously used Protobuf for game servers, they chose Protobuf for serialization and the built‑in JDK Gzip for compression, leading to the approach described in this article.
1. What is Protobuf?
Protobuf is Google’s language‑agnostic binary data exchange format. Implementations exist for Java, C#, C++, Go, Python, and community ports for JavaScript, Lua, etc. It provides a compiler and runtime library for each language.
Because it is binary, it is much faster than XML and includes basic data‑type compression. It is suitable for inter‑service communication, heterogeneous environment data exchange, configuration files, and data storage.
2. What is Protostuff?
Protostuff is a Java runtime serialization library built on top of Protobuf. It eliminates the need to write .proto files manually; the library can generate schemas from existing Java classes, enabling cross‑language serialization when corresponding .proto definitions are created.
3. Code Implementation
The author shows how user feature data is serialized with Protostuff, compressed with Gzip, and stored in Redis. (The original article includes several screenshots of the code.)
4. Test Data Output
Original data size
71343 bytes
After Protostuff serialization
65280 bytes
After Gzip compression
7403 bytes
Number of feature values
7892 double values
Traditional serialization size
110677 bytes
After Protostuff serialization
71028 bytes
After Gzip compression
796 bytes
After Gzip decompression
71028 bytes
Feature count after deserialization
7892 double values
5. Summary
Using Protostuff allows unlimited expansion of the data structure stored in Redis without compatibility issues and provides multi‑language support. Other languages can read the data by defining the same .proto schema. Gzip further compresses the payload, dramatically reducing memory consumption, enabling a single Redis cluster to handle billions of user records.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.