Big Data 10 min read

Exploring Map Data Serialization Techniques at Gaode

Gaode’s map platform serializes massive geographic data by weighing bandwidth, extensibility, and decoding speed, comparing open‑source protobuf and FlatBuffers with a bespoke binary format that adds chapter‑based storage, variable attributes, and aggressive compression, ultimately selecting the method that best fits specific size‑or‑speed priorities.

Amap Tech
Amap Tech
Amap Tech
Exploring Map Data Serialization Techniques at Gaode

Gaode’s platform consists of an upper layer (navigation, traffic, walking, rendering, offline packages, etc.) that provides basic map display, search, navigation, and traffic services to hundreds of millions of users, and a lower layer (data collection and production) that continuously gathers the latest geographic data.

The data collected by the lower layer must be processed and serialized into binary form before being delivered to the upper‑layer services. Likewise, the upper‑layer services exchange data among themselves, which also requires binary serialization.

Map data includes roads, POIs, water bodies, green spaces, buildings, etc., and can be abstracted into three geometric types: points, lines, and polygons. Further abstraction yields two categories: geometric data and attribute data.

This article shares the Gaode technical team’s exploration and practice in the field of map data serialization.

Key Factors for Serialization

Data Volume: For client apps, traffic consumption is critical. Gaode requires fresh data (e.g., newly opened highways) to be available instantly, which forces frequent cache invalidation and real‑time server requests. Therefore, map data must be as compact as possible to reduce bandwidth and user data usage. Offline packages are also large (often gigabytes), impacting download speed and device storage.

Extensibility: Client apps cannot control user‑initiated updates yet must evolve continuously. The data format must support forward and backward compatibility. Inter‑service data exchange should also be compatible to avoid tight coupling and complex coordination.

Encoding/Decoding Efficiency: Binary decoding speed is a crucial metric for both services and clients.

Choosing a Serialization Scheme

3.1 Advantages and Drawbacks of Open‑Source Serialization Libraries

The lowest‑cost option is to adopt open‑source libraries such as Protocol Buffers (protobuf) and FlatBuffers. These libraries provide a data description language that generates encoders (to serialize in‑memory data to binary) and decoders (to reconstruct in‑memory data from binary).

Benefits: no need to design a custom schema or implement codecs; the team can focus on business logic. Drawbacks: limited ability to tailor the format to the specific characteristics of map data, which may lead to sub‑optimal data size.

Both protobuf and FlatBuffers support schema evolution and maintain forward/backward compatibility. FlatBuffers achieves this via its table structure, while protobuf uses the optional keyword.

Data volume and codec efficiency are often trade‑offs. Protobuf reduces size by using varint for integers and zig‑zag encoding for negatives. FlatBuffers prioritises decoding speed through memory‑mapping, zero‑copy, and random‑access techniques.

3.2 Custom Serialization Specification

Because generic libraries cannot meet the stringent size requirements of nationwide map data, Gaode designed a custom serialization specification that retains the advantages of open‑source solutions while further reducing data size and improving decoding speed.

The custom design introduces two extensible patterns: chapter‑based storage and variable attributes. These enable arbitrary data extensions, guarantee compatibility, and allow the decoder to skip irrelevant sections, boosting efficiency.

Compression techniques employed include:

Varint encoding for integers.

Zig‑zag encoding for signed numbers.

Storing doubles as scaled integers.

Geometric simplification at different map scales (e.g., Douglas‑Peucker, Li‑Openshaw).

Curve fitting for geometry (e.g., Bézier, CLOTHOID).

Delta encoding for sequential coordinates.

Bit‑level packing instead of byte‑aligned storage.

General‑purpose lossless compressors (dictionary‑based LZ series and statistical Huffman/arithmetic coding) are also supported selectively for attribute data, as geometry data is already heavily compressed by the custom methods.

Comparison

The following table summarizes the trade‑offs among protobuf, FlatBuffers, and the custom specification:

protobuf

flatbuffers

custom spec

Extensibility

Supported

Supported

Supported

Data Size

Medium

High

Low

Decoding Efficiency

Medium

High

Low

Random Access

Not Supported

Supported

Supported

Zero‑Copy

Not Supported

Supported

Not Supported

Conclusion

All three serialization approaches have their strengths. In terms of extensibility, they are comparable. If decoding speed is paramount, FlatBuffers excels; if minimizing data size is the priority, the custom specification is superior. The optimal choice depends on the specific business requirements.

SerializationProtobufcompressionmap dataFlatBuffers
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.