AI-Driven Content Risk Control: System Evolution and Optimization at Alibaba
Alibaba Mom’s AI‑driven content risk platform has evolved from simple rule‑matching to a data‑centric, serverless architecture that integrates large‑model acceleration, decision‑tree compilation, high‑throughput vector retrieval and elastic word‑matching, delivering sub‑100 ms text and sub‑1 s image moderation while remaining stable during peak promotional traffic.
Business Background and Challenges
Content is a key marketing carrier; risky ads can harm platform reputation. Alibaba Mom aims to control real‑time ad changes and quickly locate and clean billions of stored ads.
AI wave introduces new AI‑generated creative tools, bringing new risk characteristics and higher requirements for response speed, compute resources and cost.
Challenges
Variety, volatility and fast spread of risky content.
Higher demands on detection capability, latency, resources and cost.
Specific ability challenges (peak load after promotions), efficiency challenges (real‑time AI content interception), cost challenges (high compute for images, video, live), quality challenges (large‑scale evaluation).
AI‑Driven Risk Engine Evolution
Since 2013, Alibaba Mom’s content risk system evolved through three stages:
Stage 1 – Rule‑based
Simple keyword, blacklist/whitelist, and basic attribute rules; limited against variant risks.
Stage 2 – Model‑assisted
Introduce algorithmic models; still rule‑driven, but manual threshold tuning becomes costly.
Stage 3 – Data + Algorithm
Data‑driven risk control with models guided by domain experts; supports custom business needs and emergency handling.
System Construction Layers
Stage 1 – Simple pipeline
Word matching, rules, blacklist/whitelist via synchronous calls.
Stage 2 – DAG‑based async
Model and retrieval services added; asynchronous Metaq calls; DAG with >1000 nodes becomes performance bottleneck.
Stage 3 – Serverless split
Separate DataFlow (sample building) and ControlFlow (sample consumption) with a gateway; DAG‑based concurrent scheduling of downstream services.
Large‑Model Acceleration
Re‑engineered model service; adopted BLIP, CLIP, and XGB for risk filtering. Chose CUDA‑Graph based Kangaroo‑Engine for inference acceleration.
Kangaroo‑Engine Features
Multiple captures for dynamic shapes.
GPU memory reuse across graphs.
Shape‑bucket padding to limit graph count.
Achieves ~2× RT reduction for BLIP on A10.
Device‑Specific Optimizations
On P100, parallel streams raise QPS by ~25%; on T4, TensorRT‑8.6 reduces ViT block latency by 5×.
Traditional ML Model Acceleration
Adopt Treelite for decision‑tree inference; compiles trees to .so for branch prediction, delivering order‑of‑magnitude speedup over XGBoost.
Hundred‑Billion‑Scale Retrieval Service
Unified online/offline engine based on Dolphin VectorDB; supports real‑time updates, high QPS, and consistent results.
Full‑Elastic Sensitive‑Word Matching Service
Switch to Wu‑Manber algorithm for lower memory and incremental updates; integrate with vector retrieval for unified indexing.
Cloud‑Native DevOps Management
Unified operation platform for model, retrieval, and word services; plugin‑style release, logical resource pools, and automated scaling.
Business Support and Future Outlook
Supports AI‑generated content moderation with sub‑100 ms text and sub‑1 s image latency; stable during major promotions; plans for further performance, elasticity, and unified online‑near‑offline services.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.