Tagged articles
1 articles
Page 1 of 1
Meituan Technology Team
Meituan Technology Team
Feb 9, 2023 · Backend Development

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

Meituan Visual's engineering team tackled the common low‑GPU‑utilization bottleneck in online inference services by splitting model structures and adopting micro‑service deployment, raising GPU usage from 40% to 100% and more than tripling QPS, and then generalized the approach for other GPU‑based services.

GPUMicroservicesPerformance Optimization
0 likes · 21 min read
Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization