How Baidu’s LBS Cloud Storage Revolutionizes Massive Geospatial Data Retrieval
Baidu’s LBS Cloud Storage and Retrieval platform offers developers a fully managed, high‑performance solution for massive geospatial data, featuring free large‑scale storage, GeoHash‑based spatial queries, real‑time updates, strong isolation, and a series of architectural optimizations that dramatically improve latency, availability, and scalability.
System Features
Free massive storage space, supporting tens of millions of records per table.
Efficient geospatial search using GeoHash algorithm, handling tens of thousands of QPS.
High real‑time performance: data updates propagate to the search side within seconds.
High availability: storage availability 4‑9, search availability 5‑9.
Strong flexibility: customizable columns, attributes, and field participation in search.
Data safety: safety and security mechanisms, three‑copy replication, strict user isolation via AK keys.
Initial Architecture and Evolution
The platform consists of several core modules:
Control Service : authentication, traffic, and quota control; all external LBS cloud services pass through it.
Storage Access Layer : parses and forwards storage requests, optionally publishing updates to the search side.
Search Access Layer : parses search requests, converts stored column attributes into AS‑compatible queries, and forwards them to the backend search cluster.
AS (Advanced Search Unit) : receives requests, performs DA analysis, and forwards to the basic search cluster.
DA (Data Analyzer) : query parsing, including tokenization, where/what analysis.
AC (Access Controller) : routes search and incremental update messages to the appropriate basic search unit.
Build Cluster : periodically merges full and incremental indexes and pushes them to the basic search cluster.
Cloud Analysis : provides user search behavior analysis reports.
Cloud Display : visualizes search and analysis data.
Problem 1 – Index Isolation
Original design mixed all users' indexes in a single inverted list, causing performance degradation as user count grew and leading to long‑tail latency and missed results.
Goal: Fully isolate users with independent index ranges and improve performance.
Solution: Redesign the full‑index structure to support ordered intervals with possible duplicate keys, introduce a secondary index (Table.meta) recording start position and length for each user’s index range.
The new design separates tables A, B, C, etc., reducing base search latency from 12.7 ms to 7 ms and cutting the >100 ms tail proportion from 2.82 % to 1.58 %.
Problem 2 – Access Layer Bottleneck
The initial access layer, implemented in PHP, was ten times slower than the C++ driver due to multiple storage cluster calls.
Solution: Refactor the access layer in C++, switch from short to long connections, bringing latency down to a few tens of milliseconds.
Problem 3 – Summary Retrieval Latency
Fetching summary details after obtaining document IDs incurred >20 ms latency.
Goal: Reduce summary retrieval latency to 10 ms.
Solution: Introduce a Redis cache for hot tables, asynchronously sync missed data to the cache via message queues, and merge three user‑related tables into two cached tables to cut one request.
Problem 4 – Batch Operations Overload
Version 2 added asynchronous batch operations (upload, delete, update), which broke quota protection and caused massive task queues, stressing the build cluster.
Solution: Enforce per‑user batch quotas, discard excess tasks, and apply traffic shaping to protect real‑time updates.
Summary
The described optimizations—index isolation, C++ refactoring, caching, and quota management—significantly improved the LBS cloud storage and retrieval system’s latency, availability, and scalability, laying a foundation for future real‑time update enhancements and richer storage capabilities.
Baidu Maps Tech Team
Want to see the Baidu Maps team's technical insights, learn how top engineers tackle tough problems, or join the team? Follow the Baidu Maps Tech Team to get the answers you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
