Artificial Intelligence 16 min read

Construction of a Virtual Category‑Tag System for 58 Local Services Using Machine Learning

This article describes the end‑to‑end design and implementation of a virtual category‑tag framework for 58 local services, detailing data preparation, tag selection via semantic similarity models, tag mounting, synonym normalization, experimental comparisons of CDSSM, MatchPyramid, BERT, RoBERTa and other techniques, and outlines future improvements.

58 Tech

Jul 5, 2021

Construction of a Virtual Category‑Tag System for 58 Local Services Using Machine Learning

The 58 local services platform divides businesses into multi‑level categories (e.g., automotive, training, housekeeping) and introduces virtual categories as supplements to help users filter specific services such as "house repair".

The overall workflow consists of two major parts: generating virtual categories based on third‑level categories and mounting relevant tags onto them. The process is broken down into three steps: tag selection, virtual‑category tag mounting, and synonym normalization.

In the data‑preparation stage, a tag lexicon is built from existing merchant posts and hot‑search terms, while hierarchical category data are taken from the platform's existing taxonomy.

For tag selection, the problem is cast as semantic similarity between a virtual category and candidate tags. Various text‑matching models—including CDSSM, MatchPyramid, BERT‑Chinese, and RoBERTa‑Chinese—are evaluated. Experiments show that RoBERTa yields the highest recall, so the top 2000 tags (by cosine similarity) are retained as candidates for each virtual category.

During tag mounting, cosine similarity between each candidate tag and the virtual category (as well as its higher‑level categories) is computed, and tags are assigned to the virtual category with the highest relevance score. Hot‑search terms are added to increase tag diversity.

Synonym normalization addresses the large number of synonymous tags (e.g., "furniture repair" vs. "repair furniture"). A synonym dictionary is generated, and duplicate removal is performed using a binary‑classification model. Model candidates such as BiLSTM+Attention, HAHNN, RoBERTa, and data‑augmentation methods (EDA, UDA) are compared; RoBERTa + Linear achieves the best recall and accuracy, and is used to build the final synonym dictionary.

The final virtual‑category‑tag mapping, illustrated in the paper’s figures, provides each virtual category with a curated set of tags, enriched by hot‑search terms and hierarchical category information, thereby improving user experience, post retrieval, and conversion rates.

In conclusion, the system successfully standardizes the construction pipeline, but future work includes expanding the tag lexicon, refining synonym handling, and exploring contrastive learning methods such as SimCSE to further boost model performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Tagging BERT text matching synonym normalization virtual categories

Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.