Backend Development 13 min read

Platformization of POI Deep Information Integration at Amap: Design and Implementation

Amap transformed its fragmented POI deep‑information pipelines into a unified platform that automates data acquisition, parsing, dimension alignment, specification mapping, and lifecycle management across billions of records, enabling product managers to integrate, debug, and scale diverse content‑provider feeds with real‑time, end‑to‑end control.

Amap Tech

Apr 10, 2020

Platformization of POI Deep Information Integration at Amap: Design and Implementation

This article presents the thinking and practice behind the platformization of POI (Point of Interest) deep‑information integration at Amap, describing how the system evolved from isolated monolithic applications to a unified platform that handles data acquisition, parsing, dimension alignment, specification mapping, and lifecycle management.

Background : POI data includes not only basic name and coordinates but also rich deep information such as images, reviews, and business hours. As the number of content providers (CPs) grew, each CP originally had its own independent application, storage, and technology stack, leading to maintenance overhead and scalability bottlenecks.

Rapid growth in task volume (over 120 new tasks per quarter) and data scale (daily processing reaching billions of records) prompted the deep‑information team to explore a platform approach.

Platformization Practice : Unlike typical data‑ingestion systems that remain stateless, the Amap platform adopts a more integrated design where the platform itself understands and processes the data, handling parsing, dimension alignment, specification mapping, and lifecycle maintenance.

The platform must also empower product managers (PMs) to perform end‑to‑end data integration, analysis, and debugging with a real‑time, WYSIWYG experience.

Key Challenges :

Data scale is highly uneven – some CPs provide billions of records while others provide a single record; individual record sizes range from a few bytes to several megabytes.

Business scenarios are diverse, involving HTTP, OSS, ODPS, MetaQ, and other sources, each with different schemas and matching rules.

Mapping and cleaning logic is complex because deep‑information uses loosely‑structured JSON with nested objects and arrays, requiring support for hundreds of different specifications.

Platform Architecture : The platform consists of four core modules – Foundation, Conversion, Push, and Task – which together cover the entire deep‑information ingestion workflow.

Foundation Module : Manages CP, industry, specification, and permission metadata in a unified online system.

Conversion Module : Handles data acquisition, dimension alignment, and specification mapping.

Push Module : Sends transformed specification data to downstream services.

Task Module : Manages task types, backlog strategies, and data differencing.

Conversion Engine Design : To avoid the complexity of external DAG‑based engines, the team built a custom engine inspired by PDI’s directed‑graph model. The engine executes steps in memory, supports asynchronous parallel execution with back‑pressure, and keeps all data transfer within the same process.

Data Acquisition : Supports multiple sources (HTTP, OSS, ODPS, MTOP, MetaQ, Push) and composite workflows (e.g., download from OSS, parse, then call an HTTP service). Existing solutions (Blink, Stream) were insufficient, leading to a bespoke design.

Dimension Alignment : Aligns heterogeneous data sources to a common POI dimension without relying on external databases, using in‑memory and local‑disk processing for operations such as flattening and merge‑join.

Specification Mapping & Cleaning : Introduces a dual‑container RowSchema (main data + data tray) to separate raw step results from transformation parameters. Supports forward mapping with extended JSONPath expressions and reverse cleaning with layered strategies.

Lifecycle Management : Provides batch and stream processing models, with strategies for batch expiration, time‑based expiration, and conditional off‑line handling. Custom task scheduling, alerting, and storage designs were implemented to meet the platform’s specific needs.

Conclusion & Outlook : The platform has enabled rapid scaling of deep‑information services with minimal human effort, supporting Amap’s expansion into various life‑service domains. Future work includes full‑link debugging, fine‑grained operations, handling non‑standard data, and a more flexible business orchestration platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Big Data platform architecture Data Integration POI Conversion Engine

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.