Big Data 26 min read

Serialization and Deserialization: Concepts, Protocols, and Selection Guidelines

The article explains serialization and deserialization fundamentals, compares key protocols (XML/SOAP, JSON, Thrift, Protobuf, Avro) across readability, performance, extensibility and security, presents benchmark results, and offers practical guidelines for choosing the most suitable format for various distributed system scenarios.

Meituan Technology Team
Meituan Technology Team
Meituan Technology Team
Serialization and Deserialization: Concepts, Protocols, and Selection Guidelines

Serialization and deserialization are everyday concerns for engineers, yet mastering these concepts is not trivial. They often hide inside frameworks or appear under other familiar terms such as encryption or persistence. Choosing the right serialization protocol is a crucial step in system design or refactoring, especially for distributed, large‑scale systems. A suitable protocol can improve system universality, robustness, security, performance, and debuggability.

The article is organized into five parts:

Definition of serialization/deserialization and their place in communication protocols.

Characteristics of serialization protocols from a user’s perspective.

Typical components involved in serialization and a comparison with database access components.

Analysis of several popular serialization protocols (XML, JSON, Thrift, Protobuf, Avro) with examples.

Benchmark results and practical selection advice.

1. Definitions and Related Concepts

In the OSI model, the Presentation Layer is responsible for converting application objects to a continuous binary stream (serialization) and vice‑versa (deserialization). In the TCP/IP stack, this functionality belongs to the application layer. The article defines:

Serialization: converting a data structure or object into a binary stream.

Deserialization: converting the binary stream back into a data structure or object.

It also explains the differences between data structures, objects, and binary streams across languages such as Java and C++.

2. Serialization Protocol Characteristics

The article discusses several dimensions:

Generality : cross‑platform/language support and popularity.

Robustness : maturity of the protocol and fairness across languages/platforms.

Debuggability/Readability : human‑readable output (e.g., XML, JSON) eases verification.

Performance : time and space overhead (verbosity).

Extensibility/Compatibility : ability to add fields without breaking existing services.

Security/Access Restrictions : compatibility with HTTP/HTTPS firewalls.

3. Serialization Components

Typical components include:

IDL (Interface Description Language) files that describe the data contract.

IDL compiler that generates language‑specific stubs/skeletons.

Stub/Skeleton libraries that perform the actual (de)serialization.

Client/Server application code that uses the generated classes.

Underlying transport and network layers.

The article provides a visual analogy between serialization components and database access components.

4. Common Serialization Protocols

XML & SOAP

XML is a self‑describing, human‑readable format but verbose. SOAP builds on XML for structured message exchange, using WSDL as its IDL.

Example WSDL fragment:

<xsd:complexType name='Address'>
  <xsd:attribute name='city' type='xsd:string' />
  <xsd:attribute name='postcode' type='xsd:string' />
  <xsd:attribute name='street' type='xsd:string' />
</xsd:complexType>
<xsd:complexType name='UserInfo'>
  <xsd:sequence>
    <xsd:element name='address' type='tns:Address'/>
    <xsd:element name='address1' type='tns:Address'/>
  </xsd:sequence>
  <xsd:attribute name='userid' type='xsd:int' />
  <xsd:attribute name='name' type='xsd:string' />
</xsd:complexType>

Typical use cases: low‑volume, low‑latency inter‑company communication where human readability is valuable.

JSON

JSON originates from JavaScript’s associative arrays and offers a compact, human‑readable format. It is widely used in web browsers and Ajax.

Java example used in the article:

class Address {
    private String city;
    private String postcode;
    private String street;
}

public class UserInfo {
    private Integer userid;
    private String name;
    private List<Address> address;
}

Typical use cases: web front‑ends, mobile apps, scenarios requiring fast iteration and high compatibility.

Thrift

Thrift is a high‑performance RPC framework that includes its own binary serialization format. It provides IDL and code generation for many languages.

Thrift IDL example:

struct Address {
  1: required string city;
  2: optional string postcode;
  3: optional string street;
}

struct UserInfo {
  1: required string userid;
  2: required i32 name;
  3: optional list<Address> address;
}

Typical use cases: internal high‑performance RPC services where binary compactness matters.

Protobuf

Protobuf provides a compact binary format with an explicit IDL and compiler.

Protobuf IDL example:

message Address {
  required string city = 1;
  optional string postcode = 2;
  optional string street = 3;
}

message UserInfo {
  required string userid = 1;
  required string name = 2;
  repeated Address address = 3;
}

Typical use cases: high‑throughput internal services, data persistence where size matters.

Avro

Avro, part of the Apache Hadoop ecosystem, supports both JSON and binary encodings and includes a self‑describing schema.

Avro schema example (JSON format):

{
  "protocol" : "Userservice",
  "namespace" : "org.apache.avro.ipc.specific",
  "version" : "1.0.5",
  "types" : [ {
    "type" : "record",
    "name" : "Address",
    "fields" : [ {"name":"city","type":"string"}, {"name":"postcode","type":"string"}, {"name":"street","type":"string"} ]
  }, {
    "type" : "record",
    "name" : "UserInfo",
    "fields" : [ {"name":"name","type":"string"}, {"name":"userid","type":"int"}, {"name":"address","type":{"type":"array","items":"Address"}, "default":[] } ]
  } ],
  "messages" : { }
}

Typical use cases: Hadoop‑based data pipelines, scenarios requiring schema evolution.

5. Benchmark and Selection Advice

The article presents benchmark data (sourced from https://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking) showing parsing speed and serialized size for XML, JSON, Thrift, Protobuf, and Avro. Key conclusions:

XML (XStream) performs poorly in both speed and size.

Thrift is slightly worse than Protobuf in space‑time overhead.

Protobuf and Avro excel in both dimensions.

Selection recommendations include:

For inter‑company calls with latency >100 ms, SOAP (XML) is acceptable.

Web/Ajax and mobile communication should favor JSON.

When debugging is difficult, choose human‑readable formats (JSON or XML).

High‑performance, compact needs point to Protobuf, Thrift, or Avro.

For petabyte‑scale persistence, Protobuf or Avro are preferred; Avro integrates better with Hadoop.

Dynamic‑language environments benefit from Avro.

Static‑language environments often prefer Protobuf.

For a full RPC solution, Thrift is a solid choice.

If protocol‑agnostic transport is required, Protobuf is advantageous.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

serializationProtocolsDeserialization
Meituan Technology Team
Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.