Fundamentals 16 min read

Understanding Object Serialization: Principles, Frameworks, and Performance Optimizations

This article explains the concept of object serialization, compares generic formats like JSON/XML with binary approaches, discusses optimization principles, key performance metrics, and reviews major serialization frameworks such as Protobuf, Thrift, Hessian, Kryo, and Avro, while also covering TLV encoding, varint algorithms, and practical pitfalls.

58 Tech

Mar 16, 2020

Understanding Object Serialization: Principles, Frameworks, and Performance Optimizations

Serialization converts an object's state into a transmittable or storable format, enabling reconstruction via deserialization. In micro‑service and big‑data environments, high call frequencies and massive data volumes demand fast, compact binary serialization rather than verbose JSON/XML.

Background : Serialization is ubiquitous in RPC and data pipelines; performance gains are critical but increase complexity, so engineers must understand its details.

What is Serialization?

Broad sense: turning objects into a storable representation.

Why JSON/XML are suboptimal: they embed field names, inflating size, and treat all data as strings, leading to inefficiency.

Narrow sense (binary serialization): encoding objects into platform‑independent binary streams, reducing size and parsing cost.

Binary Serialization

Analyzed from three angles: optimization principles, key metrics, and common frameworks.

1. Optimization Principles

Space: use numeric types directly, omit field descriptors, compress numbers (e.g., varint).

Time: avoid unnecessary copies, eliminate intermediate strings, decode fields sequentially.

2. Key Metrics

Speed – serialization/deserialization latency.

Size – length of the data packet.

Expressiveness – support for primitive and complex types.

Flexibility – need for IDL files or schema‑free operation.

Cross‑language – ability to work across Java, PHP, C#, etc.

3. Common Frameworks

Protobuf – Google’s lightweight, high‑performance solution.

Thrift – Apache’s RPC framework with its own serialization.

Hessian – Dubbo’s default Java serialization.

Kryo – Small, fast Java serializer used in Spark, Storm (limited language support).

Avro – Apache’s schema‑based serializer, JSON‑style IDL, good for Hadoop ecosystems.

TLV Encoding

TLV (Tag‑Length‑Value) structures encode type, length, and value, allowing nested representations. Protobuf’s TLV uses varint for tags, optional length for fixed‑size types, and length‑delimited for strings, bytes, or nested messages.

Varint Algorithm

Varint encodes integers using a variable number of bytes: the most‑significant bit indicates continuation, and 7 bits per byte store the value. It reduces space for small numbers but may expand for large values; alternatives include fixed‑32/64 for large ints/longs.

Local Optimizations

Merge consecutive tags for repeated fields.

Omit default values (e.g., false booleans).

Use ZigZag encoding for signed numbers.

Pitfalls

Reserve space for future fields and keep field order stable.

Avoid schema changes that break backward compatibility.

Handle circular references with object maps.

Choose property‑based serialization over getter/setter reflection for simplicity.

Summary

The article covered the definition of serialization, its role in RPC and big‑data, compared generic and binary formats, introduced major frameworks, and discussed implementation details such as TLV, varint, and practical engineering considerations.

References

Serialization solution selection articles.

Performance test repositories.

TLV protocol blogs.

Historical encoding specifications.

Deep dives into Protobuf.

58.com internal RPC framework source.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Microservices serialization protobuf frameworks binary

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.