Zero‑Cost Upgrade: OneSearch‑V2 Launches Generative Search, Boosting Buyers and Orders
OneSearch‑V2 introduces a zero‑cost generative search upgrade that leverages latent‑reasoning‑enhanced self‑distillation, thought‑augmented query understanding, and behavior‑feedback preference alignment, delivering offline HitRate gains of up to 2.68 % and online CTR, buyer and order increases of roughly 4 %, 2 % and 2 % respectively.
Background
Generative retrieval in e‑commerce suffers from insufficient complex query understanding, weak personalized intent reasoning, and reward‑model over‑fitting. OneSearch‑V1 reduced inference cost and improved high‑frequency queries but still left a large portion of traffic (≈⅓) with low conversion.
OneSearch‑V2 Core Design
The paper proposes a Latent‑Reasoning‑Enhanced Self‑Distillation (LR‑ESD) framework composed of three modules:
Thought‑augmented Query Understanding : a keyword‑based chain‑of‑thought (CoT) generated by Qwen3‑32B that extracts intent, category, attributes and topics, providing dense semantic signals without increasing token length.
Reasoning‑internalized Self‑Distillation : an asymmetric teacher‑student distillation where the teacher receives the full keyword‑augmented input and the student receives only the raw query; KL loss forces the student to encode the reasoning into its weights, eliminating extra inference cost.
Behavior‑Feedback Preference Alignment : replaces the independent reward model with a composite reward built from relevance, calibrated post‑click CTR, and click‑order signals. A Token‑Position Marginal Advantage (TPMA‑GRPO) distributes credit according to the hierarchical SID sequence.
Encoding Experiments
OneSearch‑V2 retains the V1 KHQE+RQ‑OPQ encoding. Extensive offline tests compare single‑modal text encoding, multimodal unified encoding, and multimodal separate‑then‑concat encoding on ~5 M <query, item> pairs (titles + two main images). Results show single‑modal methods outperform multimodal (e.g., small‑scale bge‑base beats large Qwen3‑VL) because cross‑modal noise dilutes key attributes. The KHQE scheme achieves the best trade‑off between accuracy and real‑time latency.
Thought‑augmented Query Understanding Pipeline
The three‑step pipeline processes each query:
Query analysis : intent detection, hierarchical category matching, attribute extraction, and topic recommendation.
Keyword extraction : generate concise keyword CoT, enforce intent‑category‑attribute consistency, deduplicate synonyms, and rank by item popularity.
Preference calibration : use user profile and recent interactions to filter or augment the keyword set; during training, inject the currently viewed item as a strong signal.
Keyword‑driven CoT generation runs asynchronously and its results are cached for identical queries, yielding zero latency overhead.
Reasoning‑internalized Self‑Distillation Details
The teacher computes logits on query + keyword‑CoT while the student computes logits on query only. KL divergence aligns the student distribution to the teacher’s, and the loss combines standard cross‑entropy with the distillation term. Regularization includes:
R‑Drop: two forward passes with different dropout masks, minimizing their KL divergence.
FGM: adversarial perturbation of the embedding layer to improve input robustness.
Focal loss: mitigates long‑tail class imbalance in the SID vocabulary.
Ablation shows the self‑distilled student outperforms both the teacher (despite lacking keywords) and all alternative distillation strategies (special‑token, CODI‑style hidden‑state alignment, EMA teacher, joint training).
Behavior‑Feedback Preference Alignment
The composite reward R combines:
Relevance reward (four‑level scoring).
Posterior conversion reward (calibrated CTR).
Click and order reward (direct user actions).
TPMA‑GRPO decomposes sequence‑level advantage into position‑level marginal contributions, applying a prefix gate that only propagates gradients when the prefix matches the ground‑truth, thus respecting the hierarchical nature of SID generation.
Offline Evaluation
Using 30 k logged PVs (30 k clicks, 7 229 orders), models are evaluated with HitRate@10 and MRR@10. Incremental additions yield:
CoT tasks: +0.48 % Order HR@10.
Self‑distillation: +1.17 % Order HR@10, +1.67 % Click HR@10.
R‑Drop: MRR@10 ↑ 0.0028.
FGM: Order HR@10 ↑ 0.2180, Click HR@10 ↑ 0.2422.
Focal loss: Order HR@10 0.2214, Click HR@10 0.2471.
TPMA‑GRPO: final OneSearch‑V2 achieves Order HR@10 = 0.2314, Click HR@10 = 0.2568, a 2.68 % average HR@10 gain over baseline.
Online A/B Test
OneSearch‑V2 vs V1 shows statistically significant lifts (p < 0.05): CTR +3.98 %, page‑click +1.17 %, conversion +2.90 %, buyer count +2.07 %, order volume +2.11 %.
Deeper Analyses
Performance gains are consistent across user groups, query‑frequency buckets, and item hotness, with especially large improvements for low‑activity users and cold‑start items. The CTR curve changes from a “reverse‑U” in V1 to a “U” in V2, indicating better handling of both head and tail queries.
Industry‑level CTR uplift averages 3.98 % across categories; categories with ambiguous titles (apparel, shoes, cosmetics, hardware) see the largest gains.
CoT keyword coverage continuously rises in production, ensuring stable self‑distillation updates.
Trade‑off experiments confirm that higher relevance reward alone can reduce order conversion, while the full TPMA‑GRPO balance improves final order metrics.
During the 3.18 global shopping festival, increasing the relevance reward for emerging merchants boosted their rankings and demonstrated the system’s ability to adapt objectives in real time.
Future Directions
Planned work includes beyond‑log training for long‑tail queries, unified multimodal SID encoding for videos/live‑streams, and online learning mechanisms for agentic search systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
