How Adaptive Semantic IDs Enable Precise and Generalizable Generative Retrieval

The article introduces the SA²CRQ framework, which adaptively allocates semantic ID length and transfers residual knowledge to resolve head‑item ID collisions and tail‑item generalization gaps in large‑scale e‑commerce generative retrieval, achieving stable gains on both industrial and public datasets.

JD Retail Technology
JD Retail Technology
JD Retail Technology
How Adaptive Semantic IDs Enable Precise and Generalizable Generative Retrieval

Background and Challenge

In e‑commerce search the user intent becomes increasingly fine‑grained, exposing the limits of traditional recall systems in expressive power, maintenance cost, and long‑tail coverage. Generative retrieval generates candidate SKUs directly from user context, but raw SKU IDs lack semantic information, making modeling inefficient and limiting generalization. Semantic IDs (SIDs) discretize continuous vector representations into structured tokens, enabling more effective learning and better recall.

Technical Challenges

Search recall requires precise query‑item semantic matching. Under a single SID, multiple SKUs must share consistent relevance; otherwise “semantic bucket noise” degrades precision, especially for fine‑grained queries (e.g., "phone" → "Apple iPhone 17 Pro Max 2TB"). SID design must (1) increase separability of head‑item embeddings, especially for items with subtle differences, and (2) approximate a one‑to‑one mapping to reduce relevance inconsistency.

Related Work

Existing SID methods focus on quantization, collision avoidance, multimodal signals, alignment with model training, long‑tail/generalization, interpretability, and inference. Notable techniques include DSI (hierarchical clustering), TIGER (RQ‑VAE), OneRec (RQ‑Kmeans), OneSearch (OPQ for residual attributes), DOS (orthogonal rotation), Qarm‑v2 (FSQ), and various collision‑mitigation strategies (random tokens, balanced K‑means, sinkhorn constraints, entropy regularization). Multimodal fusion approaches such as MME‑SID, MMQ, and BBQRec also exist. Most treat SID length as fixed and do not jointly address head‑item separability and tail‑item generalization.

Motivation

E‑commerce data exhibit a severe long‑tail distribution: a few head SKUs dominate interactions, while many tail SKUs have sparse signals. This causes (1) head‑SKU ID collisions due to dense clustering, reducing recall precision, and (2) tail‑SKU isolation, limiting learnable semantic structure and cold‑start performance.

Overall Framework (SA²CRQ)

SA²CRQ (Anchored Curriculum with Sequential Adaptive Quantization) consists of two modules:

ACRQ (Anchored Curriculum Residual Quantization)

Head Training allocates more clustering centers to head SKUs based on click logs, increasing SID discriminability and reducing collisions.

Tail Training freezes the head codebook as semantic anchors and learns an additional trainable codebook for tail SKUs, allowing them to align with head semantics while gaining dedicated representation space.

SARQ (Sequential Adaptive Quantization) dynamically determines SID length per SKU by measuring path entropy. Head SKUs receive longer SIDs for fine‑grained distinction; tail SKUs stop early, producing shorter, more generalizable SIDs that share semantic paths.

Embedding Construction

Embedding compresses SKU titles, categories, brands, and attributes into vectors. Textual features dominate; image features are optional but showed limited ROI in early experiments. Pure semantic embedding models (e.g., BGE‑M3, Qwen‑Embedding) outperform multimodal or collaborative K‑NN models in search scenarios due to stronger discriminability.

Construction Methods

Common hierarchical residual quantization techniques include:

RQ‑Kmeans: multi‑round clustering of residuals.

RQ‑VAE: combines residual quantization with variational auto‑encoders.

RQ‑OPQ: adds OPQ on residuals for finer detail.

RQ‑FSQ: applies FSQ on the final layer to reduce code collisions.

SA²CRQ builds on these by anchoring head codebooks and adaptively truncating SID length for tails.

Evaluation Metrics

Independent Coding Rate (ICR): proportion of SIDs mapping to a single SKU.

Codebook Utilization Rate (CUR): usage ratio of each codebook layer.

Statistical distribution: mean, median, quantiles of SKU‑per‑SID counts.

Same‑product rate and relevance inconsistency rate based on annotated same‑product pairs and query‑item relevance scores.

Entropy and Gini coefficient per layer to assess uniformity.

Experimental Results

On JD’s large‑scale industrial search dataset, SA²CRQ achieved the best performance among baselines (including TIGER). SKU recall@K improved noticeably, hallucination rate decreased, and online A/B tests showed +0.13% UCVR and +0.42% UV value. The deployed 1.7 B model runs at 30 QPS with a 99th‑percentile latency of ~50 ms on a single NVIDIA RTX 5090, confirming production feasibility.

Downstream Task Integration

Semantic IDs were fed into generative retrieval models, evaluating SKU Recall, SID Recall, and SKU MRR. Improvements were observed across all metrics, demonstrating that higher‑quality SIDs directly translate to higher recall quality.

Conclusion

The study emphasizes downstream‑driven evaluation while using upstream distribution analysis as auxiliary insight. Recall‑oriented metrics remain primary, supplemented by distribution indicators (ICR, average SKU‑per‑SID, same‑product rate, relevance inconsistency, entropy, Gini). SA²CRQ effectively balances head‑item discriminability and tail‑item generalization, offering a practical solution for large‑scale generative retrieval in e‑commerce.

Reference

Paper: https://arxiv.org/abs/2602.23978 (SIGIR 2026) – "Towards Efficient and Generalizable Retrieval: Adaptive Semantic Quantization and Residual Knowledge Transfer"

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

EmbeddingE-commerce SearchGenerative RetrievalSemantic IDLong-tail DistributionAdaptive QuantizationResidual Knowledge Transfer
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.