Artificial Intelligence 23 min read

Inside 1688’s Inference‑Based Recommendation System: Architecture, Challenges, and Future Directions

This article details how Alibaba 1688 tackles the “information cocoon” problem by deploying large‑model inference‑based recommendation, describing its three‑layer architecture, multi‑stage user demand analysis, long‑cycle behavior compression, prompt engineering, trend mining, near‑line serving, and future enhancements.

DataFunSummit

May 6, 2026

Inside 1688’s Inference‑Based Recommendation System: Architecture, Challenges, and Future Directions

Background

In e‑commerce recommendation, the “information cocoon” effect causes users to see increasingly homogeneous items, limiting discovery of new or trending products. 1688, a B‑to‑B wholesale platform, needs higher discovery for buyers seeking new items, trend goods, and business opportunities.

Definition of Discovery

Discovery is viewed from two angles: the supply side (exposing new, trending, or new‑seller items) and the user side (broadening the breadth of categories a user interacts with, including generated semantic IDs).

B‑vs‑C User Discovery Differences

C‑end users mainly seek personal items and benefit from cross‑category recommendations (e.g., matching tops with bottoms). B‑end users focus on wholesale procurement, requiring discovery of new items within their main categories as well as cross‑category combos to support diverse business needs.

Advantages of Inference‑Based Recommendation

Leverages world knowledge from large models to expand demand reasoning.

Retrieves external information via RAG, improving business‑opportunity recommendations.

Provides traceable, explainable recommendation chains, enhancing transparency and trust.

Overall Architecture

The system consists of three core layers:

Infrastructure Layer : aggregates long‑ and short‑term user behavior (click, collect, add‑to‑cart, inquiry, search) over a year to capture stable B‑user business preferences.

Inference Layer : generates explicit user‑demand trend Query by converting behavior and market trends into recommendation‑ready queries.

Application Layer : handles the recommendation pipeline, including vector retrieval and downstream ranking controls.

Challenges and Solution Ideas

The team faced several obstacles:

High model‑selection cost : rapid iteration of open‑source models (DeepSeek, Qwen) required extensive quality comparison.

High inference cost : fine‑tuning instructions for business‑need understanding demanded many versions, increasing resource consumption.

Discovery evaluation difficulty : subjective quality of inferred demand queries and sparse online feedback made offline metrics (category width, PVR) misaligned with business goals such as GMV and 7‑day conversion.

Large inference‑resource demand : long user histories caused token overflow, stressing online resources.

Key solutions include:

1. Long‑Cycle Behavior Compression Agent

To mitigate token length and information drift, a compression agent aggregates eight weeks of user search queries and short‑title product captions into a five‑tuple <week, category, amount, quantity, decision‑factor>. This reduces token count while preserving essential signals.

#250504_卡片_22_65_3_魔童闹海:22|炽影包:7|五元包:5

2. Multimodal Short Title

Using Qwen‑VL, the system generates concise multimodal titles that embed visual cues (style, material) into text, enabling downstream text‑based inference.

3. Prompt Engineering (PE) Example

The agent is defined as a data‑summary expert. Input includes formatted fields such as time period, search terms, click duration, add‑to‑cart titles, and purchase GMV. The prompt extracts decision factors (e.g., "显瘦气质套装") and outputs a structured JSON‑like string.

4. User Business‑Portrait Agent

After compression, a second model produces a business portrait covering main category analysis (e.g., women’s wear 70%, accessories 15%), vertical B‑buyer analysis, drop‑shipper type, core intent, and a multi‑dimensional buyer profile (objective identity, customer group, operating strategy, purchase motive).

5. CFV (Category‑Factor‑Value) Preference Analysis

CFV captures core decision factors per category, distinguishing long‑term, recent, and potential preferences (e.g., long‑term "法式:9|碎花:8|黑色:7"). This informs precise demand queries.

6. Trend Content Mining

RAG retrieves market trend data from sources like Xiaohongshu, Worthbuy, Google, and Taobao. The system filters for relevance, extracts trend selling points, and normalizes queries via embedding similarity clustering.

7. Query Generation and Optimization

Hallucination reduction: a dedicated hallucination‑evaluation agent and reward function cut hallucination rate >80%.

User relevance: a KG‑path based U‑Q model aligns generated queries with online feedback.

Diversity: ensures generated queries cover varied attributes and categories.

Format stability: SFT fine‑tunes output to strict JSON for seamless pipeline integration.

8. Model Distillation

Qwen‑3‑4B was distilled to a smaller model, halving average response time while preserving effectiveness, using cleaned online click data as training material.

Near‑Line Link Design

To meet first‑guess RT limits, resource constraints, and timeliness, a near‑line pipeline augments the offline flow.

Trigger Design

Global‑behavior trigger : activates when users exhibit activity in private domains (search, transactions) even if recommendation‑scene signals are sparse.

Whole‑site behavior window : real‑time Blink stream triggers inference after a configurable number of product‑detail views (e.g., 5 views).

Functional Modules

User offline‑online data collection (long‑term preferences, click/collect/add‑to‑cart/order, RAG‑derived trend knowledge).

Two‑stage personalized recall: U2Q (User‑to‑Query) generation, Q2Vec embedding, followed by Q2I (Query‑to‑Item) with both non‑personalized high‑relevance and dynamic UQ2I weighting.

Unified coarse‑ and fine‑ranking scoring to ensure recommendation quality.

Optimized average pipeline latency is now 7‑10 seconds.

Future Outlook

Refine discovery metrics using generative semantic IDs for finer B‑user monitoring.

Incorporate B‑specific signals (image search, inquiry) to improve demand inference.

Strengthen multimodal reasoning to capture visual fashion cues missing from pure text.

Deploy Agentic RAG that autonomously selects external sources (Xiaohongshu, Google, Taobao) based on long‑term business needs.

Adopt generative semantic‑ID‑based query recall for higher precision and diversity.

Through comprehensive user demand analysis, multi‑source trend mining, and optimized near‑line serving, 1688’s inference‑based recommendation markedly improves discovery and business‑opportunity recommendation, and will continue evolving with multimodal and Agentic RAG advances.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce Prompt Engineering Large Language Model multimodal behavior compression inference recommendation trend mining

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.