Artificial Intelligence 14 min read

Visual Technology for Automated POI Name Generation: STR, Text Detection, and Naming Practices

Amap’s visual‑technology pipeline automatically generates and updates POI names by crowdsourcing street‑level images, applying deep‑learning scene‑text recognition, dual‑branch classification of text attributes, and a BERT‑plus‑graph‑attention model that selects and orders recognized text, achieving about 95 % naming accuracy.

Amap Tech
Amap Tech
Amap Tech
Visual Technology for Automated POI Name Generation: STR, Text Detection, and Naming Practices

In the third article of Amap’s "Spring Recruitment" column, the content summarizes a talk by Haozhihui (Head of Amap Visual Technology Center) on the practice of visual technology for automated POI (Point of Interest) name generation.

Amap maintains more than 70 million POIs. New POIs are continuously added while some become obsolete. To keep the POI database up‑to‑date, Amap uses a crowdsourced image collection workflow: field staff walk along streets, capture continuous images of storefronts, and upload the images together with GPS coordinates.

The collected images are processed to extract POI content and location. The pipeline consists of three key computer‑vision tasks: natural scene text recognition (STR), text‑attribute determination and structuring, and automatic name generation.

Natural Scene Text Recognition (STR) – STR aims to read text that appears in real‑world scenes such as shop signs and road signs. Traditional STR (pre‑2012) relied on OCR pipelines with image preprocessing, binarization, MSER, and statistical classifiers. Since 2012, deep‑learning models have dominated the field, using architectures such as Textboxes++, Mask R‑CNN for text‑line detection, and LSTM/CTC or attention‑based sequence recognizers for character decoding. Amap’s current STR system combines a text‑line detection module (instance‑segmentation models like Mask R‑CNN) with two parallel recognition branches: a single‑character detector/recognizer and an end‑to‑end sequence recognizer. The two branches are fused to improve robustness, especially for ambiguous characters.

Text Attribute Determination and Structured Processing – After detection and recognition, each text line is classified as either a POI name or noise. A dual‑channel CNN (image + text) is used to filter out non‑POI text. Further classification splits POI text into attributes such as main name, branch name, business scope, and contact information. Semantic segmentation of signboards is also employed to isolate individual signs, ensuring that the main name is uniquely identified.

Name Generation – With recognized and filtered text lines, the system must generate a final POI name. This is treated as a joint classification‑regression problem. A BERT‑based model predicts which text lines should be selected and their ordering. To incorporate visual context, a Graph Attention Network encodes the spatial relationships of bounding boxes, and its output is concatenated with BERT embeddings. Recent experiments also integrate VL‑BERT, achieving name‑generation accuracy around 95%.

The article also mentions challenges such as the large variety of Chinese characters (far exceeding the 3‑5 k common characters) and the need for hard‑case mining and synthetic data generation to improve model robustness.

Overall, the talk provides a comprehensive view of how Amap applies modern computer‑vision and deep‑learning techniques—STR, instance segmentation, dual‑branch recognition, attribute classification, and multimodal name‑generation models—to automate the creation and maintenance of its massive POI database.

computer visiondeep learningOCRpoiAMapname generation__str__
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.