Artificial Intelligence 13 min read

How Baidu Transformed E‑Commerce Risk Control with Multimodal AI Agents

This article details Baidu’s e‑commerce risk‑control overhaul, describing how a multimodal large‑model, rule engine, and knowledge‑base collaboration replaces traditional manual and rule‑based reviews, achieving full‑machine coverage, instant feedback, higher accuracy, and improved merchant and user experience.

Baidu Tech Salon

Nov 5, 2025

How Baidu Transformed E‑Commerce Risk Control with Multimodal AI Agents

Introduction

The article focuses on Baidu e‑commerce risk‑control scenarios, addressing weaknesses of traditional machine review such as poor multimodal detection, ambiguous semantics, and poor audit experience, and proposes an AI‑driven transformation based on a MultiAgent paradigm.

Background and Problems

Baidu Preferred, a Baidu e‑commerce brand, faces a three‑corner challenge of safety, efficiency, and experience. Traditional workflow (merchant submission → rule‑based machine review → human review) suffers from human‑review bottlenecks, weak machine coverage, long latency, and vague rejection reasons.

Platform risk: inaccurate or delayed risk control can lead to regulatory penalties and loss of trust.

Merchant pain: 2‑4 hours (or up to a day) waiting time, vague “content violation” feedback.

User concern: exposure to unreliable or low‑quality information.

The goal is to rebuild the review system with large models to achieve full machine coverage, instant feedback, and high explainability.

Technical Solution

The proposed “large model + rules + knowledge base” collaborative agent consists of four layers:

Input Layer : aggregates multimodal data (text, images, structured data) from merchant submissions.

Feature Extraction Layer : combines rule‑based extraction, LLM text understanding, and multimodal model image analysis to capture comprehensive product features.

Risk Decision Layer : lets deterministic rules handle clear logic, the knowledge base provide external information (brand authorizations, category trees, qualification records), and the LLM synthesize all signals for final judgment.

Output Layer : returns precise pass/fail decisions, natural‑language rejection reasons, and actionable remediation suggestions.

The workflow achieves “full‑machine coverage + instant feedback + dynamic calibration”.

Key Highlights

Standardization of Review Criteria : quantified human‑review standards, consolidating 700+ risk points into 24 core groups covering >95% of online violations.

Multimodal Model Integration : uses a large language model together with a multimodal vision model and a knowledge base for “multimodal understanding + precise judgment”.

Case Studies

1. Qualification‑Missing Risk

For high‑risk categories (e.g., health products, medical devices), merchants must provide clear, complete qualifications. Traditional rules only detect keywords like “import” and miss many violations. The new solution fine‑tunes a domain‑specific model via high‑quality sample collection, prompt engineering, data augmentation, and full‑parameter or instruction tuning, achieving near‑human accuracy.

2. Brand‑Authorization Risk

Detects counterfeit logos (e.g., “Nlke” imitating “Nike”) and verifies brand authorizations via knowledge‑base lookup. The pipeline combines multimodal logo detection, LLM‑based text‑image similarity, and knowledge‑base queries.

3. Category‑Misplacement Risk

Accurate category selection is crucial for traffic and compliance. Traditional methods rely on title matching, leading to easy evasion. The solution evolves from a basic similarity‑based recall to a two‑stage recall‑rank architecture, upgrading similarity models (e.g., bge‑large‑emb) and adding targeted recall paths for high‑risk items.

Results and Impact

The deployment delivered “three rises and three drops”:

Increases: machine‑review coverage, review speed, merchant satisfaction.

Decreases: human‑review volume, appeal rate, user complaint rate.

Key lessons include the necessity of multimodal fusion (80% of e‑commerce data is multimodal), prioritizing explainability (LLM‑generated clear reasons enable one‑click remediation), and establishing a closed‑loop iteration using monitoring and appeal data.

Conclusion and Outlook

Large models act as assistants that free reviewers from repetitive tasks, allowing them to focus on standard setting and risk forecasting. Future work will explore further AI‑risk‑control boundaries and provide rapidly transferable model deployment solutions.

Illustrations

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce Knowledge Base AI risk control machine review multimodal large model

Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.