Inside Youzan’s Query Parser: Architecture, Plugins, and Real‑World Impact

This article explains the role of Youzan’s Query Parser (QP) in search, walks through its overall and layered architecture, details each algorithmic plugin—from preprocessing to synonym handling—and shows concrete code examples and results that improve search relevance across multiple retail scenarios.

Youzan Coder
Youzan Coder
Youzan Coder
Inside Youzan’s Query Parser: Architecture, Plugins, and Real‑World Impact

Introduction

The article introduces Youzan’s Search Platform and explains how the Query Parser (QP) fits into the end‑to‑end search request flow, from API entry to A/B testing of QP‑driven optimizations.

Role of QP

In natural‑language processing, QP (Query Understanding) parses a query’s lexical, syntactic, and semantic layers, turning raw user input into a structured representation. It supports search queries, FAQ questions, reading‑comprehension queries, and conversational inputs, providing a unified interface that decouples business logic from algorithmic processing.

For example, when a user types “衣服”, QP weights the clothing category higher so that clothing items appear before accessories, improving the search experience.

Overall Design

The overall design diagram shows the QP request flow and configuration flow. When a search request reaches QP, the platform extracts the scene tag from the request body and loads the corresponding QP configuration, which includes search‑term markers, plugin lists, and DSL rewrite scripts.

QP then executes plugins sequentially according to the configuration, optionally applying human‑intervention settings and hyper‑parameters to the results. After plugin execution, QP rewrites the search DSL (e.g., inserting corrected terms or applying category weighting).

Layered Design

The layered diagram visualises QP’s internal structure from top to bottom. The main layers are:

controller layer : entry point for query rewriting, performs request pre‑processing.

service layer : fetches rewrite configuration based on scene, extracts search terms from DSL, and invokes the appropriate plugins.

plugin layer : runs algorithmic plugins, calls the corresponding handler, and handles success or failure.

handler layer : contains the concrete algorithm implementations and may depend on external services such as Milvus.

intervener layer : applies manual interventions to handler results.

processor layer : executes rewrite plugins according to QP configuration to modify the DSL.

Algorithm Plugin Design

1. Preprocess Plugin

The preprocess plugin normalises the query according to configuration rules:

Remove special symbols such as " ", “, \.

Convert uppercase to lowercase and full‑width characters to half‑width.

Split continuous English letters and numbers, otherwise split individually.

Truncate the list to the first 50 characters/words.

Join the list into a single string.

输入:"史蒂夫新款\时尚套装夏修身圆领百搭钩花DWF镂空雪纺两件套套裙;"
输出:"史蒂夫新款时尚套装夏修身圆领百搭钩花dwf镂空雪纺两件套套裙"

2. Correction Plugin

This plugin detects misspelled terms and returns the corrected version. It uses a BERT‑based model distilled for lower latency and a tri‑gram language model to rank candidates generated from homophones.

输入:[上海牛黄皂]
输出:[上海硫磺皂]

3. Tokenizer Plugin (Fine‑grained Tokenisation)

Based on a customised Jieba implementation, this plugin builds a frequency‑based dictionary from product titles, industry data, and open‑source corpora. It limits the maximum length of matched tokens (default 2) to control granularity.

输入:[雪地靴女2020年新款皮毛一体冬季加绒加厚防滑东北厚底保暖棉鞋子]
输出:[雪地 靴 女 2020 年 新款 皮毛 一体 冬季 加绒 加厚 防滑 东北 厚底 保暖 棉 鞋子]

4. Semantic Segmentation Plugin

On top of fine‑grained tokenisation, this plugin builds a semantic tree to merge highly related tokens. In the example, “雪地” and “靴” are merged because their semantic similarity is high.

输入:[雪地 靴 女 2020 年 新款 皮毛 一体 冬季 加绒 加厚 防滑 东北 厚底 保暖 棉 鞋子]
输出:[雪地靴 女 2020年 新款 皮毛一体 冬季 加绒加厚 防滑 东北 厚底 保暖 棉鞋子]

5. Entity Tagging Plugin

The tagging plugin recognises product‑related entities such as product words, product modifiers, brands, etc. It can boost the ranking of identified product terms.

输入:["汽车","脚垫","刷子"]
输出:[{"word":"汽车","tag":"产品修饰词"},{"word":"脚垫","tag":"产品修饰词"},{"word":"刷子","tag":"产品词"}]

6. Category Prediction Plugin

Based on contrastive learning, this plugin predicts the most relevant category hierarchy for a query, enabling category‑weighting in the final ranking.

输入:牛奶绒
输出:{"categoryId":"101000010001","categoryName":"被套","categoryChainList":["家居建材","床上用品","被套"],"parentCategoryId":"10100001","level":3,"hasChildren":true,"percent":0.9010684490203857}

The model improves relevance; without category weighting, a query for “牛奶绒” returns dairy‑related items, while with weighting it returns “牛奶绒床单” as expected.

7. Synonym Plugin

Uses an offline synonym dictionary to replace product terms with their synonyms, e.g., converting “衬衣” to “衬衫”.

输入:[衬衣]
输出:[衬衫]

Conclusion and Outlook

The article provides a complete walkthrough of QP’s architecture, layered design, and plugin ecosystem. After more than a year of iteration, QP supports hour‑level integration for scenarios such as new retail, micro‑mall,精选, and distributor markets, significantly improving search relevance. Future work will expand algorithmic plugins and add visual configuration tools for easier business adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System Architecturesearch engineNLPsemantic segmentationplugin designYouzanquery parsing
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.