Big Data 7 min read

How Baidu Maps Powers Its Open Platform with Big Data Architecture

This article explains how Baidu Maps’ open platform handles massive daily location data through real‑time and offline pipelines, Hadoop‑based offline computing, stream processing, and query engines built on MySQL, Redis, and Apache Kylin, while outlining future big‑data enhancements.

Baidu Maps Tech Team

Feb 3, 2016

How Baidu Maps Powers Its Open Platform with Big Data Architecture

As the "big data" buzz intensifies, Baidu Maps leverages its massive daily data—hundreds of billions of location points and client accesses—to continuously improve navigation, routing, and platform services.

Data Exchange

Data collection is divided into real‑time and offline (T+1) based on timeliness, with logs and DB data as common sources. Log collection reuses internal components similar to Flume and Kafka, and a custom Flume+Kafka pipeline is built for fast‑growing services.

For DB data, a solution that streams changes (e.g., binlog parsing) to downstream queues is used, currently employing a company‑wide databus and the open‑source Canal, with plans to consolidate the architecture.

Offline Computing

The destination of data exchange is a Hadoop cluster, where multi‑dimensional, multi‑level calculations and storage build the open platform data warehouse. The warehouse design draws on experiences from banking and telecom to create a stable, evolvable data model, though details are omitted.

Key components include a task scheduler handling tens of thousands of daily jobs, a resource manager optimizing task execution, a metadata system serving as the meta‑layer for all services, and a quality‑monitoring system that automatically detects and resolves data anomalies.

Storage separates hot data for daily computation from cold data retained permanently on a dedicated cold‑backup cluster.

Real‑Time Computing

Real‑time scenarios such as the map “heat‑map” product analyze billions of positioning requests instantly. The architecture resembles Storm, using modules like importer, task processor, and exporter to perform complex stream processing.

Query Engine

After processing, data is exposed to users through various mechanisms. Report data is stored in MySQL for its small size and developer‑friendly SQL. User‑facing queries use Redis for high‑performance key‑value access. Internal analytics rely on a multi‑dimensional engine built on Apache Kylin.

Future work includes developing an MPP‑based ad‑hoc query capability integrated with metadata to offer more flexible analysis.

Data Products

The big‑data stack supports products such as reporting platforms, multi‑dimensional analysis platforms, heat‑maps, and B‑side web applications, typically built on a LAMP architecture. An open‑source visualization library (mapv) is also provided.

Conclusion

After more than three years, Baidu Maps’ open platform has formed a relatively complete big‑data technology stack and team. The platform continues to evolve with business needs, and the team remains committed to learning and openness.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Pipeline Real-time Processing Hadoop Apache Kylin Baidu Maps

Written by

Baidu Maps Tech Team

Want to see the Baidu Maps team's technical insights, learn how top engineers tackle tough problems, or join the team? Follow the Baidu Maps Tech Team to get the answers you need.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.