Choosing and Optimizing Serialization for High‑Performance Messaging
The article explains why serialization is essential for inter‑process communication, compares common formats like JSON, Protobuf, Kryo, and custom binary schemes, outlines selection criteria such as readability, complexity, speed and density, and provides code examples and interview‑style Q&A for high‑performance messaging systems.
Why Serialization Matters
When processes communicate over a network, they exchange binary streams. Programming languages and network frameworks expose APIs that send and receive bytes, but the data we need to transmit is usually structured—commands, text, or messages represented as objects. Converting these objects to a byte stream (serialization) and back (deserialization) is therefore essential.
Common Uses of Serialization
Beyond network transmission, serialization is used to persist objects to files. In large‑scale data scenarios, objects are serialized to disk to free memory and later deserialized, ensuring data durability and reducing memory pressure.
Choosing a Serialization Technique
Many serialization options exist. Simple approaches convert an object to a string and then to bytes, which works but is inefficient. Popular built‑in or open‑source solutions include:
Google Protobuf, Kryo, Hessian
Text‑based formats such as JSON and XML
Custom private implementations
Selection criteria typically consider:
Readability of the serialized data
Implementation complexity
Serialization / deserialization speed
Information density (smaller byte size)
No single format excels in all dimensions; trade‑offs must be balanced based on business needs.
Readability vs. Density
JSON / XML: highest readability, lowest density.
Kryo / Hessian: binary, good performance, moderate density.
Practical Recommendation
For most business systems (e‑commerce, social apps) where performance requirements are moderate, JSON is recommended because it is easy to use and human‑readable, despite higher CPU and storage costs.
Example: Serializing a User object with JSON.
User:
name: "zhangsan"
age: 23
married: trueResulting JSON string: {"name":"zhangsan","age":"23","married":"true"} Code to serialize in Java (using a JSON library):
byte[] serializedUser = JsonConvert.SerializeObject(user).getBytes("UTF-8");If JSON performance is insufficient, binary serializers such as Kryo can be used with similar implementation effort but better speed and smaller payloads.
Example: Kryo serialization of the same User object.
kryo.register(User.class);
Output output = new Output(new FileOutputStream("file.bin"));
kryo.writeObject(output, user);
output.close();Performance‑Focused Custom Serialization
Message‑queue (MQ) systems often require higher throughput than generic serializers provide, prompting custom binary formats. By fixing field order and omitting field names, payload size can be dramatically reduced.
Custom binary representation of the User object (illustrative):
03 | 08 7a 68 61 6e 67 73 61 6e | 17 | 01User | z h a n g s a n | 23 | trueExplanation:
First byte 03 identifies the object type (User).
Next byte 08 stores the length of the name, followed by the 8‑byte name "zhangsan".
Age is stored as a single byte 17 (hex for 23).
Marital status uses one byte: 01 for married, 00 for single.
This custom format uses 12 bytes versus 47 bytes for the JSON representation, yielding faster transmission but at the cost of readability and increased implementation complexity.
Summary
Inter‑process communication requires converting structured objects to binary data via serialization. When selecting a serializer, balance readability, implementation effort, speed, and payload size. In most cases, a high‑performance generic binary serializer (e.g., Kryo) or JSON suffices; custom binary formats should be reserved for scenarios with extreme performance or bandwidth constraints.
Interview Quick‑Q&A
Why not transmit raw in‑memory binary data directly? In‑memory representations are language‑specific (e.g., Java vs. PHP) and contain pointers and layout details that other languages cannot interpret. Serialization defines a language‑agnostic protocol, enabling cross‑language communication and persistent storage.
Key challenges of raw binary transmission:
Network byte order vs. host byte order (endianness) must be handled.
Platform differences: primitive sizes, struct alignment, and OS‑specific endianness affect compatibility.
Pointer and reference handling: objects may reference other objects via memory addresses, which are meaningless on another machine.
Addressing these issues essentially leads to building a custom serialization framework.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
