How OnePiece Brings Context Engineering and Implicit Reasoning to Industrial Ranking
This article details the OnePiece framework, which integrates context engineering, anchor item sequences, and progressive implicit reasoning into generative recommendation systems, achieving significant offline and online performance gains on Shopee Search by enhancing model inference, personalization, and computational efficiency.
Background
Generative recommendation (GR) has progressed rapidly, but most research focuses on training large base models. The transfer of large‑language‑model (LLM) inference capabilities—such as chain‑of‑thought (CoT) reasoning and context engineering—to industrial recommendation pipelines remains under‑explored. The paper OnePiece: Bringing Context Engineering and Reasoning to Industrial Cascade Ranking System demonstrates how these techniques can improve a large‑scale e‑commerce search scenario (Shopee Search).
Context Engineering for Recommendation
Directly applying CoT prompts to recommendation is infeasible because user decision logic varies and mixing natural‑language reasoning with item‑ID sequences creates heterogeneous inputs. OnePiece adopts test‑time few‑shot learning by prepending anchor item sequences —high‑frequency click or purchase patterns collected across many users—as domain‑expert examples. The resulting prompt consists of four parts:
Interaction History (IH) : the target user’s raw behavior sequence (e.g., item IDs of recent clicks/purchases).
Preference Anchors (PA) : engineered anchor sequences that encode typical high‑frequency patterns for the current query.
Situational Descriptor (SD) : special tokens (e.g., USER, QUERY) that convey session‑level context.
Candidate Item Set (CIS) : the set of items visible to the ranking model during inference.
The full input format is illustrated in the diagram below.
Implicit Reasoning in GR
LLMs achieve strong reasoning by generating explicit step‑by‑step text chains. In recommendation, inputs are item‑ID sequences without natural language, making explicit chains impractical. OnePiece leverages implicit reasoning , where the model performs multiple latent computation steps during the forward pass and emits only the final prediction token. The internal multi‑hop reasoning is hidden in latent space, allowing the model to “think” without exposing a textual chain.
Progressive Supervision of Latent Reasoning
Implicit reasoning lacks explicit process supervision, which can cause the model to converge to sub‑optimal internal solutions. Inspired by GNOLR’s progressive multi‑task learning, OnePiece inserts shallow‑supervision anchors at each intermediate reasoning step. The supervision hierarchy guides the model from simple to high‑order reasoning (simple → intermediate → complex), improving reasoning organization and stability without exposing the latent chain.
Model Architecture Details
Input embeddings combine item ID vectors with side‑information embeddings (e.g., category, price) via a linear adaptor before the Transformer encoder.
Block‑wise reasoning partitions the forward pass into multiple reasoning blocks, increasing the information bandwidth of each latent step.
Bidirectional attention is applied within the pre‑fill portion of the sequence, enabling richer token‑level interaction without the latency of auto‑regressive decoding.
Experimental Evaluation
Offline Ablation
Experiments were conducted on Shopee’s industrial DLRM baseline. Adding Preference Anchors (PA) consistently improved both HSTU and ReaRec variants. OnePiece further gained from block‑wise reasoning and progressive supervision. Key findings:
Block‑wise reasoning increased ranking accuracy by expanding the model’s latent reasoning capacity.
Progressive supervision yielded more stable outputs and better convergence than supervising only the final token.
Incorporating side‑information via the linear adaptor significantly boosted performance, confirming the benefit of multi‑modal token representations.
Bidirectional attention in the pre‑fill stage improved context aggregation, leading to higher offline metrics.
Offline results are visualized in the figure below.
Online A/B Tests
Two orthogonal online experiments were run in the Shopee Search pipeline:
In the recall stage, OnePiece replaced the DeepU2I recall model, achieving a 1.08 % increase in GMV per user .
In the prerank stage, OnePiece replaced the DLRM prerank model, delivering a 1.12 % GMV per user uplift and a 2.9 % increase in ad revenue .
Future Directions
OnePiece 1.0 demonstrates that prompt optimization and latent reasoning can endow a recommendation model with LLM‑like instruction‑following behavior without natural‑language inputs. The next step, OnePiece 2.0, aims to build a General Recommender Model that unifies multiple recall strategies across scenarios within a single architecture, further reducing feature‑engineering effort and enabling prompt‑driven adaptation.
Relevant resources:
arXiv preprint: https://arxiv.org/pdf/2509.18091
Hugging Face paper page: https://huggingface.co/papers/2509.18091
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
