Artificial Intelligence 19 min read

Query Intent Recognition in Enterprise Search: Knowledge‑Enhanced and Pretrained Model Approaches

This article explains how Alibaba's enterprise search system tackles query intent recognition by combining knowledge‑enhanced techniques, short‑text classification, and pretrained language models such as StructBERT and prompt‑learning, and it shares two real‑world case studies, experimental results, and future research directions.

DataFunTalk

May 18, 2023

Query Intent Recognition in Enterprise Search: Knowledge‑Enhanced and Pretrained Model Approaches

Background

Enterprise digitalization relies on AI, big data, and cloud computing to transform business and management processes. Alibaba’s internal search platform aggregates content from dozens of sites (DingTalk docs, Yuque, ATA, etc.) and serves over 140 QPS. A unified search engine is needed to avoid information silos and improve relevance.

The search architecture includes a Query Processing (QP) service deployed on the DII platform, which performs tokenization, spelling correction, term weighting, query expansion, and intent recognition before the Ha3 engine performs recall and ranking.

Work Sharing – Case 1: Internal Assistant (内外小蜜)

The assistant uses Alibaba’s DAMO‑Lab Cloud‑XiaoMi QA engine, supporting FAQ, multi‑turn task‑oriented, and knowledge‑graph QA. Intent recognition classifies short queries (most under 10 tokens) into business lines using knowledge‑enhanced short‑text classification.

Knowledge enhancement leverages >6,000 internal knowledge cards and similar historical queries. A dual‑tower Sentence‑BERT (initialized with StructBERT) encodes queries and knowledge cards; contrastive learning (InfoNCE) aligns positive pairs while pushing apart negatives.

Two attention mechanisms (Query‑to‑Entity and Entity Self‑Attention) refine entity representations, and a fusion of original and similar query embeddings improves focus on central words. The final concatenated vector passes through dense layers for classification, outperforming standard BERT fine‑tuning.

Work Sharing – Case 2: Industry Search (采购商城)

Category prediction for product search is treated as a few‑shot text classification problem. Prompt‑learning converts classification into a masked language modeling task, enabling zero‑shot and ten‑shot performance with BERT‑base models. Self‑learning iteratively adds high‑confidence pseudo‑labels to enlarge the training set, achieving up to 82% accuracy on the full dataset and >90% in production after post‑processing.

Summary and Reflections

Key challenges include insufficient domain knowledge in short queries and scarce labeled data for specialized enterprise domains. Solutions involve internal knowledge‑card augmentation, few‑shot prompt‑learning, and potentially training enterprise‑specific large language models on internal data (e.g., ATA articles, contracts, code).

Future work also considers ensuring factual correctness of generative models by incorporating reinforcement‑learning style feedback and knowledge‑graph grounding before answer generation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

NLP Pretrained Models knowledge enhancement prompt learning enterprise search query intent

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.