Artificial Intelligence 16 min read

Construction of a Virtual Category‑Tag System for 58 Local Services Using Machine Learning

This article describes the end‑to‑end design and implementation of a virtual category‑tag framework for 58 local services, detailing data preparation, tag selection via semantic similarity models, tag mounting, synonym normalization, experimental comparisons of CDSSM, MatchPyramid, BERT, RoBERTa and other techniques, and outlines future improvements.

58 Tech
58 Tech
58 Tech
Construction of a Virtual Category‑Tag System for 58 Local Services Using Machine Learning

The 58 local services platform divides businesses into multi‑level categories (e.g., automotive, training, housekeeping) and introduces virtual categories as supplements to help users filter specific services such as "house repair".

The overall workflow consists of two major parts: generating virtual categories based on third‑level categories and mounting relevant tags onto them. The process is broken down into three steps: tag selection, virtual‑category tag mounting, and synonym normalization.

In the data‑preparation stage, a tag lexicon is built from existing merchant posts and hot‑search terms, while hierarchical category data are taken from the platform's existing taxonomy.

For tag selection, the problem is cast as semantic similarity between a virtual category and candidate tags. Various text‑matching models—including CDSSM, MatchPyramid, BERT‑Chinese, and RoBERTa‑Chinese—are evaluated. Experiments show that RoBERTa yields the highest recall, so the top 2000 tags (by cosine similarity) are retained as candidates for each virtual category.

During tag mounting, cosine similarity between each candidate tag and the virtual category (as well as its higher‑level categories) is computed, and tags are assigned to the virtual category with the highest relevance score. Hot‑search terms are added to increase tag diversity.

Synonym normalization addresses the large number of synonymous tags (e.g., "furniture repair" vs. "repair furniture"). A synonym dictionary is generated, and duplicate removal is performed using a binary‑classification model. Model candidates such as BiLSTM+Attention, HAHNN, RoBERTa, and data‑augmentation methods (EDA, UDA) are compared; RoBERTa + Linear achieves the best recall and accuracy, and is used to build the final synonym dictionary.

The final virtual‑category‑tag mapping, illustrated in the paper’s figures, provides each virtual category with a curated set of tags, enriched by hot‑search terms and hierarchical category information, thereby improving user experience, post retrieval, and conversion rates.

In conclusion, the system successfully standardizes the construction pipeline, but future work includes expanding the tag lexicon, refining synonym handling, and exploring contrastive learning methods such as SimCSE to further boost model performance.

machine learningtaggingBERTText Matchingsynonym normalizationvirtual categories
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.