Understanding Object Serialization: Principles, Frameworks, and Performance Optimizations
This article explains the concept of object serialization, compares generic formats like JSON/XML with binary approaches, discusses optimization principles, key performance metrics, and reviews major serialization frameworks such as Protobuf, Thrift, Hessian, Kryo, and Avro, while also covering TLV encoding, varint algorithms, and practical pitfalls.
Serialization converts an object's state into a transmittable or storable format, enabling reconstruction via deserialization. In micro‑service and big‑data environments, high call frequencies and massive data volumes demand fast, compact binary serialization rather than verbose JSON/XML.
Background : Serialization is ubiquitous in RPC and data pipelines; performance gains are critical but increase complexity, so engineers must understand its details.
What is Serialization?
Broad sense: turning objects into a storable representation.
Why JSON/XML are suboptimal: they embed field names, inflating size, and treat all data as strings, leading to inefficiency.
Narrow sense (binary serialization): encoding objects into platform‑independent binary streams, reducing size and parsing cost.
Binary Serialization
Analyzed from three angles: optimization principles, key metrics, and common frameworks.
1. Optimization Principles
Space: use numeric types directly, omit field descriptors, compress numbers (e.g., varint).
Time: avoid unnecessary copies, eliminate intermediate strings, decode fields sequentially.
2. Key Metrics
Speed – serialization/deserialization latency.
Size – length of the data packet.
Expressiveness – support for primitive and complex types.
Flexibility – need for IDL files or schema‑free operation.
Cross‑language – ability to work across Java, PHP, C#, etc.
3. Common Frameworks
Protobuf – Google’s lightweight, high‑performance solution.
Thrift – Apache’s RPC framework with its own serialization.
Hessian – Dubbo’s default Java serialization.
Kryo – Small, fast Java serializer used in Spark, Storm (limited language support).
Avro – Apache’s schema‑based serializer, JSON‑style IDL, good for Hadoop ecosystems.
TLV Encoding
TLV (Tag‑Length‑Value) structures encode type, length, and value, allowing nested representations. Protobuf’s TLV uses varint for tags, optional length for fixed‑size types, and length‑delimited for strings, bytes, or nested messages.
Varint Algorithm
Varint encodes integers using a variable number of bytes: the most‑significant bit indicates continuation, and 7 bits per byte store the value. It reduces space for small numbers but may expand for large values; alternatives include fixed‑32/64 for large ints/longs.
Local Optimizations
Merge consecutive tags for repeated fields.
Omit default values (e.g., false booleans).
Use ZigZag encoding for signed numbers.
Pitfalls
Reserve space for future fields and keep field order stable.
Avoid schema changes that break backward compatibility.
Handle circular references with object maps.
Choose property‑based serialization over getter/setter reflection for simplicity.
Summary
The article covered the definition of serialization, its role in RPC and big‑data, compared generic and binary formats, introduced major frameworks, and discussed implementation details such as TLV, varint, and practical engineering considerations.
References
Serialization solution selection articles.
Performance test repositories.
TLV protocol blogs.
Historical encoding specifications.
Deep dives into Protobuf.
58.com internal RPC framework source.
58 Tech
Official tech channel of 58, a platform for tech innovation, sharing, and communication.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.