Feb 9, 2023 · Backend Development

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

Meituan Visual's engineering team tackled the common low‑GPU‑utilization bottleneck in online inference services by splitting model structures and adopting micro‑service deployment, raising GPU usage from 40% to 100% and more than tripling QPS, and then generalized the approach for other GPU‑based services.

GPUMicroservicesPerformance Optimization

0 likes · 21 min read

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization

model splitting

Efficient Deployment Architecture for Visual Inference Services: GPU Utilization Optimization