Artificial Intelligence 19 min read

POI Category Tagging: Multi‑Label Classification, Feature Engineering and Model Design

The system tackles POI category tagging as a multi‑label classification problem by engineering textual and non‑textual features, mining click‑log and external samples through active learning, and deploying hierarchical and per‑tag deep textCNN models with feature fusion, achieving over 5 % accuracy gain, ten‑fold speedup, and markedly higher precision and coverage that boost map‑search relevance.

Amap Tech

Aug 27, 2019

POI Category Tagging: Multi‑Label Classification, Feature Engineering and Model Design

POI (Point of Interest) refers to entities such as buildings, shops, bus stops, lakes, roads, etc., that appear on a map. In map search, POIs are the retrieval objects, analogous to webpages in web search, and selecting a POI displays a floating balloon.

Category tags summarize POI attributes along a category dimension (e.g., a Watson’s store is tagged as "cosmetics", the mall it belongs to is tagged as "shopping mall"). Tags enrich front‑end information display and support search‑side category recall, which cannot be satisfied by simple text matching or synonym expansion.

The tag system is built on three principles: (1) real user query expressions, (2) objective real‑world category distribution and product‑manager insights, and (3) hierarchical relationships among tags. A multi‑branch tree is constructed for each top‑level category (e.g., shopping).

Tagging POIs is essentially a multi‑label classification problem, not a single‑label one. Challenges include:

Multi‑label nature (a POI may belong to several tags simultaneously).

Textual issues: short POI names, low‑frequency expressions, ambiguous names.

Comprehensive issues: domain‑specific nuances (e.g., distinguishing nightclubs from bars, high‑frequency vs. low‑frequency brands).

High precision and coverage are required because low‑quality tags lead to wrong recall or missed results. Over 20 top‑level categories and thousands of sub‑tags demand efficient, high‑recall methods.

Technical solution overview : The pipeline consists of feature engineering, sample engineering, classification models, and multi‑path fusion.

Feature engineering : Use generic textual features (POI name, typecode, source category, brand) and high‑frequency proprietary features. Textual features address the main tagging problem, while non‑textual features (typecode, source category) are incorporated via a wide‑&‑deep style.

Sample engineering : Because manual labeling is infeasible for millions of tags, click‑log data and external resources are leveraged. Click logs provide massive, user‑intent‑rich samples but are noisy and biased toward high‑frequency queries. External resources supplement low‑frequency coverage. A two‑stage active‑learning workflow cleans and iteratively refines samples.

Click‑sample mining : Transform the required tag → POI samples into tag → query → POI by defining seed queries for each tag and retrieving corresponding clicks. Query generalization expands high‑frequency seed queries to low‑frequency synonyms using methods such as word2vec embeddings, synonym dictionaries, session context, and recommendation techniques (SimRank, DeepWalk).

点击数据：query -> POI
需要样本：tag -> POI
解决方案：tag -> query -> POI

Model design :

Initial hierarchical multi‑class classifiers built per non‑leaf node.

Later, per‑tag binary classifiers (one‑vs‑rest) to handle multi‑label scenarios and reduce conflict.

Unified deep model based on textCNN, modified to (a) concatenate non‑textual features, (b) replace softmax with multiple sigmoid outputs, (c) compress multi‑label targets into a vector, and (d) adopt a weighted loss that accounts for class imbalance (weight = (h+ck)/ck).

Model improvements (textCNN with feature fusion) yielded >5% accuracy gain, 10× efficiency increase over rule‑based methods, and substantial long‑tail performance.

Evaluation : Tag quality is measured by precision and coverage; online A/B tests show that the new tag system raises category‑search relevance to 94% usage, delivering noticeable search quality improvements.

Summary & future work : The current system solves the majority of POI tagging using generic features and deep models. Remaining challenges include leveraging review/description attention mechanisms, image‑based category cues, external knowledge graphs, and crowdsourced labeling to further improve low‑frequency and ambiguous cases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning feature engineering search relevance TextCNN multi-label classification POI tagging

Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.