How Baidu Maps Re‑engineered Its Indexing Unit for Scalable Data Storage

This article details Baidu Maps' technical team’s refactoring of the indexing (build) unit, outlining existing bottlenecks, design challenges, and a new decoupled architecture that separates storage, incremental updates, and full‑index construction using distributed table storage and message‑driven pipelines to improve scalability and reliability.

Baidu Maps Tech Team
Baidu Maps Tech Team
Baidu Maps Tech Team
How Baidu Maps Re‑engineered Its Indexing Unit for Scalable Data Storage

In this technical article the Baidu Maps team shares the refactoring and optimization of the indexing (build) unit, describing the problems they faced, the design thinking behind the solutions, and how the new architecture avoids system‑design gaps that previously required guesswork about trade‑offs.

The overall cloud storage and retrieval architecture was introduced in a previous article; the red‑outlined part of the diagram below highlights the focus of this refactor.

Problem

The existing build unit became the main bottleneck of the update pipeline. It stored incremental forward indexes locally and used a single‑threaded message sequence store to guarantee no duplicate pushes. The single‑threaded design caused message back‑log when QPS reached around 100, and scaling required costly sharding across multiple services, making roll‑backs complex. Additionally, the full forward index grew large, taking more than a day to rebuild and about eight hours to generate the inverted index, with poor data safety.

Design Goals

Decouple the storage access layer from routing logic so it only forwards messages without caring about target machines.

Separate business logic from data storage, allowing independent scaling of compute and storage resources.

Split the build unit into two services: one handling incremental updates, the other periodically constructing full inverted indexes.

Adopt a high‑performance distributed storage (Table) with strong write, scan, and multi‑version capabilities, eliminating the need for a separate incremental forward store.

Separate BS incremental data from full inverted data, allowing independent partitioning and eliminating unnecessary coupling.

New Architecture

The redesigned system uses Baidu’s internal distributed Table storage as the foundation for the forward index. Table provides high write throughput, versioned data for eventual consistency, fast scan performance, and easy horizontal scaling without business‑side involvement.

index_realtime_manager processes incremental updates, writes them to Table, and forwards them to the Redis‑based inverted index via a message system that guarantees ordering and consistency, removing any dependency on the AC router.

index_inverter periodically scans Table to build the full inverted index and pushes it to the search side. Both services can be scaled independently.

Conclusion

The new design makes incremental update scaling straightforward for both services and storage, and simplifies full‑index partitioning, resulting in a more extensible, reliable, and maintainable indexing pipeline for Baidu Maps.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

IndexingScalabilitydistributed storageSearch ArchitectureBaidu Maps
Baidu Maps Tech Team
Written by

Baidu Maps Tech Team

Want to see the Baidu Maps team's technical insights, learn how top engineers tackle tough problems, or join the team? Follow the Baidu Maps Tech Team to get the answers you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.