Alimama Tech
Feb 12, 2025 · Artificial Intelligence
HighService: A High‑Performance Pythonic AI Service Framework for Model Inference and Global Resource Scheduling
HighService, Alibaba’s Pythonic AI service framework, accelerates large‑model inference and maximizes GPU utilization by separating CPU‑GPU processes, offering out‑of‑the‑box quantization, parallelism and caching, and dynamically reallocating idle GPUs across clusters through a master‑worker scheduler to keep online latency low while boosting offline throughput for diffusion and LLM workloads.
AI ServiceHigh PerformancePython
0 likes · 16 min read