Exploring Map Data Serialization Techniques at Gaode
Gaode’s map platform serializes massive geographic data by weighing bandwidth, extensibility, and decoding speed, comparing open‑source protobuf and FlatBuffers with a bespoke binary format that adds chapter‑based storage, variable attributes, and aggressive compression, ultimately selecting the method that best fits specific size‑or‑speed priorities.
Gaode’s platform consists of an upper layer (navigation, traffic, walking, rendering, offline packages, etc.) that provides basic map display, search, navigation, and traffic services to hundreds of millions of users, and a lower layer (data collection and production) that continuously gathers the latest geographic data.
The data collected by the lower layer must be processed and serialized into binary form before being delivered to the upper‑layer services. Likewise, the upper‑layer services exchange data among themselves, which also requires binary serialization.
Map data includes roads, POIs, water bodies, green spaces, buildings, etc., and can be abstracted into three geometric types: points, lines, and polygons. Further abstraction yields two categories: geometric data and attribute data.
This article shares the Gaode technical team’s exploration and practice in the field of map data serialization.
Key Factors for Serialization
Data Volume: For client apps, traffic consumption is critical. Gaode requires fresh data (e.g., newly opened highways) to be available instantly, which forces frequent cache invalidation and real‑time server requests. Therefore, map data must be as compact as possible to reduce bandwidth and user data usage. Offline packages are also large (often gigabytes), impacting download speed and device storage.
Extensibility: Client apps cannot control user‑initiated updates yet must evolve continuously. The data format must support forward and backward compatibility. Inter‑service data exchange should also be compatible to avoid tight coupling and complex coordination.
Encoding/Decoding Efficiency: Binary decoding speed is a crucial metric for both services and clients.
Choosing a Serialization Scheme
3.1 Advantages and Drawbacks of Open‑Source Serialization Libraries
The lowest‑cost option is to adopt open‑source libraries such as Protocol Buffers (protobuf) and FlatBuffers. These libraries provide a data description language that generates encoders (to serialize in‑memory data to binary) and decoders (to reconstruct in‑memory data from binary).
Benefits: no need to design a custom schema or implement codecs; the team can focus on business logic. Drawbacks: limited ability to tailor the format to the specific characteristics of map data, which may lead to sub‑optimal data size.
Both protobuf and FlatBuffers support schema evolution and maintain forward/backward compatibility. FlatBuffers achieves this via its table structure, while protobuf uses the optional keyword.
Data volume and codec efficiency are often trade‑offs. Protobuf reduces size by using varint for integers and zig‑zag encoding for negatives. FlatBuffers prioritises decoding speed through memory‑mapping, zero‑copy, and random‑access techniques.
3.2 Custom Serialization Specification
Because generic libraries cannot meet the stringent size requirements of nationwide map data, Gaode designed a custom serialization specification that retains the advantages of open‑source solutions while further reducing data size and improving decoding speed.
The custom design introduces two extensible patterns: chapter‑based storage and variable attributes. These enable arbitrary data extensions, guarantee compatibility, and allow the decoder to skip irrelevant sections, boosting efficiency.
Compression techniques employed include:
Varint encoding for integers.
Zig‑zag encoding for signed numbers.
Storing doubles as scaled integers.
Geometric simplification at different map scales (e.g., Douglas‑Peucker, Li‑Openshaw).
Curve fitting for geometry (e.g., Bézier, CLOTHOID).
Delta encoding for sequential coordinates.
Bit‑level packing instead of byte‑aligned storage.
General‑purpose lossless compressors (dictionary‑based LZ series and statistical Huffman/arithmetic coding) are also supported selectively for attribute data, as geometry data is already heavily compressed by the custom methods.
Comparison
The following table summarizes the trade‑offs among protobuf, FlatBuffers, and the custom specification:
protobuf
flatbuffers
custom spec
Extensibility
Supported
Supported
Supported
Data Size
Medium
High
Low
Decoding Efficiency
Medium
High
Low
Random Access
Not Supported
Supported
Supported
Zero‑Copy
Not Supported
Supported
Not Supported
Conclusion
All three serialization approaches have their strengths. In terms of extensibility, they are comparable. If decoding speed is paramount, FlatBuffers excels; if minimizing data size is the priority, the custom specification is superior. The optimal choice depends on the specific business requirements.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.