Artificial Intelligence 18 min read

Weibo Multimodal Content Understanding Service Architecture and GPU Heterogeneous Cluster Solutions

This article details Weibo's multimodal content understanding platform, covering its massive data challenges, heterogeneous model support, standardized pipelines, platformization, workflow architecture, GPU heterogeneous cluster management, resource scheduling, performance optimization, and full‑stack monitoring to achieve stable, low‑latency AI services at scale.

DataFunSummit
DataFunSummit
DataFunSummit
Weibo Multimodal Content Understanding Service Architecture and GPU Heterogeneous Cluster Solutions

Weibo currently supports over 40 business clusters and more than 120 deep learning models for multimodal content understanding, handling peak QPS of 20,000 and processing hundreds of millions of daily posts, images, videos, and audio.

Background : Weibo has 5 hundred million monthly active users and generates around 80 million posts per day, creating a massive multimodal data workload that requires robust AI services.

Challenges :

Data scale: 70‑80 million posts and 40‑50 million images daily, requiring support for 100+ models across 40+ business clusters.

Heterogeneity: Diverse frameworks (TensorFlow, PyTorch, Caffe, Paddle) and algorithms for image, text, and audio processing.

Standardization: A C++‑based RPC framework separates model logic from service engine; Python models are bridged via a custom DAG pipeline, and Thrift RPC powers the backend.

Platformization: Real‑time requirements demand seconds‑level latency, high stability, and automated scaling.

Application Scenarios : Analysis (feature extraction), Retrieval (embedding‑based search, deduplication), and Description (image/video captioning, machine translation).

Architecture Workflow :

Sample generation → Model building → Model evaluation → Model registration → Model serving.

Data collection feeds both online inference and offline training pipelines.

Trained models are stored in a model repository and deployed via a Flink‑based real‑time inference service, producing feature vectors for downstream recommendation, ranking, moderation, and de‑duplication.

GPU Heterogeneous Cluster :

Supports GPUs ranging from legacy models (>5 years old) to modern V100S, requiring careful resource isolation across CPU, memory, disk, network, GPU memory, and compute.

Manual tagging of machines proved inflexible; now Kubernetes with cGPU/GPUManager provides fine‑grained GPU memory (256 MiB) and compute (1‑100) slicing.

Resource scheduling considers six dimensions (CPU, memory, disk, network, GPU memory, compute) to avoid bottlenecks in data loading and network transfer.

Model acceleration is achieved via a custom distributed training framework (9× efficiency over open‑source) and optimized inference engines.

Full‑stack monitoring tracks GPU utilization, memory I/O, temperature, power, CPU load, TCP connections, process health, latency, and QPS per node.

Performance Optimizations :

Distributed training on 4‑node, 16‑GPU setups yields up to 9.37× efficiency on V100 compared to baseline.

Online inference benchmarks show V100 outperforming T4 by roughly 2× due to higher memory bandwidth and sustained power.

Model‑level inference acceleration provides 20‑40% speedup, varying by model.

In summary, Weibo's multimodal AI platform integrates data collection, model training, and real‑time inference across a heterogeneous GPU fleet, delivering stable, low‑latency services that feed feature engineering pipelines and close the loop back to end‑users.

multimodal AIdistributed trainingmodel servingreal-time inferenceGPU ClusterWeibo
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.