Fundamentals 16 min read

Understanding Object Serialization: Principles, Frameworks, and Performance Optimizations

This article explains the concept of object serialization, compares generic formats like JSON/XML with binary approaches, discusses optimization principles, key performance metrics, and reviews major serialization frameworks such as Protobuf, Thrift, Hessian, Kryo, and Avro, while also covering TLV encoding, varint algorithms, and practical pitfalls.

58 Tech
58 Tech
58 Tech
Understanding Object Serialization: Principles, Frameworks, and Performance Optimizations

Serialization converts an object's state into a transmittable or storable format, enabling reconstruction via deserialization. In micro‑service and big‑data environments, high call frequencies and massive data volumes demand fast, compact binary serialization rather than verbose JSON/XML.

Background : Serialization is ubiquitous in RPC and data pipelines; performance gains are critical but increase complexity, so engineers must understand its details.

What is Serialization?

Broad sense: turning objects into a storable representation.

Why JSON/XML are suboptimal: they embed field names, inflating size, and treat all data as strings, leading to inefficiency.

Narrow sense (binary serialization): encoding objects into platform‑independent binary streams, reducing size and parsing cost.

Binary Serialization

Analyzed from three angles: optimization principles, key metrics, and common frameworks.

1. Optimization Principles

Space: use numeric types directly, omit field descriptors, compress numbers (e.g., varint).

Time: avoid unnecessary copies, eliminate intermediate strings, decode fields sequentially.

2. Key Metrics

Speed – serialization/deserialization latency.

Size – length of the data packet.

Expressiveness – support for primitive and complex types.

Flexibility – need for IDL files or schema‑free operation.

Cross‑language – ability to work across Java, PHP, C#, etc.

3. Common Frameworks

Protobuf – Google’s lightweight, high‑performance solution.

Thrift – Apache’s RPC framework with its own serialization.

Hessian – Dubbo’s default Java serialization.

Kryo – Small, fast Java serializer used in Spark, Storm (limited language support).

Avro – Apache’s schema‑based serializer, JSON‑style IDL, good for Hadoop ecosystems.

TLV Encoding

TLV (Tag‑Length‑Value) structures encode type, length, and value, allowing nested representations. Protobuf’s TLV uses varint for tags, optional length for fixed‑size types, and length‑delimited for strings, bytes, or nested messages.

Varint Algorithm

Varint encodes integers using a variable number of bytes: the most‑significant bit indicates continuation, and 7 bits per byte store the value. It reduces space for small numbers but may expand for large values; alternatives include fixed‑32/64 for large ints/longs.

Local Optimizations

Merge consecutive tags for repeated fields.

Omit default values (e.g., false booleans).

Use ZigZag encoding for signed numbers.

Pitfalls

Reserve space for future fields and keep field order stable.

Avoid schema changes that break backward compatibility.

Handle circular references with object maps.

Choose property‑based serialization over getter/setter reflection for simplicity.

Summary

The article covered the definition of serialization, its role in RPC and big‑data, compared generic and binary formats, introduced major frameworks, and discussed implementation details such as TLV, varint, and practical engineering considerations.

References

Serialization solution selection articles.

Performance test repositories.

TLV protocol blogs.

Historical encoding specifications.

Deep dives into Protobuf.

58.com internal RPC framework source.

performanceBig DataMicroservicesSerializationProtobufFrameworksbinary
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.