From Traffic Links to Task Management: 1688’s Agentic AI Evolution

The article details how 1688 transformed its platform from a traditional intent‑matching traffic hub into an Agentic AI system that understands business tasks, outlining a three‑step implementation of knowledge, trajectory and environment redesign, dual‑track evolution, novel evaluation methods, and the emerging role of product managers as evaluation engineers.

DataFunSummit
DataFunSummit
DataFunSummit
From Traffic Links to Task Management: 1688’s Agentic AI Evolution

Paradigm Shift: From "People Find Functions" to "AI Understands Business"

For over two decades, 1688’s product logic was fixed: users had to learn and hop between discrete functions such as search, price comparison, product details, and chat. This fragmented experience stemmed from simple intent matching – the system gave users what it thought they needed.

The "Source Procurement" project aims to overturn this by adopting an Agentic paradigm where interaction becomes a dialogue between humans and AI. The AI autonomously understands tasks, decomposes steps, schedules capabilities, and guides users through the entire business workflow, turning the platform into an AI‑centric system rather than just a chat window.

1688’s Complexity: Not Shopping, Doing Business

Unlike consumer e‑commerce, 1688 supports full B2B business chains that involve budgeting, supplier selection, price comparison, historical quote review, and final proposal – often a collaborative effort. To support this, the team built an Agent Loop consisting of Memory, Skill, Tool Calling, and continuous environment interaction. From an algorithmic view, every task is a causal problem: context is the "cause" and model output is the "effect"; optimizing the cause yields better results.

Three‑Step Implementation: Knowledge, Trajectory, Environment

Equip the large model with industry knowledge and experience – use Retrieval‑Augmented Generation (RAG) for generic knowledge and inject tacit buyer expertise.

Synthesize realistic business trajectories – generate large amounts of decision‑trajectory data to train and build Agent chains.

Environment transformation – adapt all interfaces for AI, ensuring the model can execute tasks rather than merely generate correct‑looking text. This involves multimodal retrieval, natural‑language retrieval, and AIGC image generation.

Key Migration: From Intent Routing to Environment Connectivity

Traditional AI acted as an intent router, dispatching requests to specific APIs. In the Agentic era, the approach shifts to RLHF/RLVR and Agent RL to connect the whole environment. The more the AI can do, the larger the environment boundary, and the better the model, the more stable the execution.

Dual‑Track Evolution: Separation, Evaluation, and Self‑Learning

The system separates logical reasoning (CPU) from execution (Skill). The large model functions as a CPU that understands, plans, selects tools, and executes, while Skills act as plug‑in business methods containing industry know‑how, rules, and SOPs.

Two quantitative anchors guide optimization:

Loading accuracy – whether the correct Skill is loaded.

Execution alignment – whether the model follows the Skill’s prescribed execution.

This enables a dual‑track iteration: algorithm teams focus on loading and execution, while engineering teams refine the Agent architecture and continuously evolve Skills.

Skill Hub and Self‑Improving System

Inspired by the Self‑Improve Agent concept, the team built a self‑evolving Skill Hub that aggregates sourcing, opportunity analysis, product search, and deep research modules. Each Skill package includes a task flow (SKILL.md), reference docs, and automation scripts.

Evolution occurs asynchronously on two levels:

Skill updates generate SubAgent analysis trajectories, refine patterns, and write back to SKILL.md for deterministic workflow optimization.

Memory updates extract and merge user preferences into MEMORY.md for long‑term preference learning.

Counter‑Intuitive Evaluation: Vibe Coding Eval

Instead of unstable model scoring, the team lets AI generate evaluation code for each Skill. Vibe Coding quickly creates Eval code covering skill trigger, context loading, SOP coverage, and output compliance. Bad cases are identified, and within ten seconds the AI produces new scoring code. The resulting reward logic is used for reinforcement learning.

Agentic RL Training Stability Practices

The team prefers on‑policy strategies, using a single‑epoch PPO update (ppo_epochs=1) and unbiased importance sampling to address inference inconsistency. Besides average reward, they monitor the minimum reward per trajectory to gauge the lower bound of performance and ensure stable task completion rates.

New Role for Product Managers: "Evaluation Engineer"

Product managers now focus on three pillars: evaluation‑driven quality, Skill‑based business leverage, and closed‑loop evolution. They define what constitutes "good" performance, translate it into quantifiable evaluation standards, and drive continuous improvement.

The team identified that users unfamiliar with AI (the "fourth‑quadrant users") are the core target; the first three interactions are critical for retention, making user onboarding a top operational priority.

Summary and Outlook

1688 AI’s goal is not a prettier UI but to become the "OS of e‑commerce business"—handling understanding, planning, execution, explanation, and continuous evolution. The platform moves from "traffic connection" to "task hosting", achieving a true goal‑closed loop where AI completes the entire business chain from user intent to result.

This end‑to‑end reconstruction—from low‑level architecture to product thinking—offers a valuable reference for any team exploring native Agentic AI pathways.

Q&A

Q: How are platform Skills and user‑defined Skills handled when they conflict?

A: The platform provides generic, safe, and system‑level capabilities (the "big currency"). User‑defined Skills are prioritized only in user‑to‑user conflicts; platform Skills never conflict with user Skills because users cannot redefine internal system connections, only their own industry know‑how.

Example: If the platform defines a price‑comparison rule A and the user defines rule B, the user’s rule wins.

以上就是本次分享的内容,谢谢大家。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e-commercelarge language modelRetrieval Augmented Generationreinforcement learningagentic AIskill-hub
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.