Tagged articles
2 articles
Page 1 of 1
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingKServe
0 likes · 13 min read
Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe
Didi Tech
Didi Tech
Aug 17, 2019 · Artificial Intelligence

Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference

Didi’s Elastic Inference Service (EIS) and its IFX AI acceleration engine provide a distributed, cost‑effective inference platform that automatically scales resources based on QPS and latency requirements, supports major deep‑learning frameworks, excels in public‑cloud, private‑cloud, IoT and edge scenarios, and achieved top‑rank DAWNBench latency and cost scores on ImageNet with P4 GPUs.

AI inferenceCloud AIDeep Learning
0 likes · 7 min read
Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference