Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference
Didi’s Elastic Inference Service (EIS) and its IFX AI acceleration engine provide a distributed, cost‑effective inference platform that automatically scales resources based on QPS and latency requirements, supports major deep‑learning frameworks, excels in public‑cloud, private‑cloud, IoT and edge scenarios, and achieved top‑rank DAWNBench latency and cost scores on ImageNet with P4 GPUs.
Elastic Inference Service (EIS)
EIS is a distributed AI inference platform that automatically provisions compute resources based on the target queries‑per‑second (QPS) and response‑time (RT) requirements of a service. It integrates load‑balancing, elastic scaling, automatic disaster recovery, and security isolation to deliver cost‑effective online inference.
Key capabilities
Dynamic resource allocation: monitors QPS/RT and scales GPU/CPU instances up or down in real time.
Full‑stack optimization: from model serving to low‑level instruction generation.
Multi‑tenant isolation and fault tolerance: automatic failover and health‑check mechanisms.
Intelligent Acceleration Engine (IFX)
IFX is the inference acceleration core of EIS. It provides low‑latency, high‑throughput execution of neural networks on heterogeneous hardware such as NVIDIA GPUs and ARM‑based accelerators.
Supported frameworks and deployment workflow
One‑click model import from TensorFlow, PyTorch, Caffe, Darknet and other major frameworks.
Automatic conversion to an optimized runtime representation that can run on the target device.
Unified deployment API that registers the model as a service, creates a serving endpoint, and handles request routing.
Hardware targets
NVIDIA GPUs (e.g., Tesla P4, V100) with CUDA kernels tuned for batch size and precision.
ARM CPUs and AI accelerators via OpenCL/Vulkan back‑ends.
DAWNBench Evaluation
On Stanford’s DAWNBench ImageNet inference benchmark (top‑5 accuracy > 93 %), IFX achieved the best reported results on a Tesla P4 GPU:
Inference latency: 1.5439 ms per image, 21 % faster than the runner‑up.
Inference cost: $0.003 per 10 000 image classifications on Didi Cloud GPU instances, compared with $0.008 for the second place.
These figures demonstrate that IFX delivers both speed and cost efficiency for large‑scale image classification workloads.
Typical Deployment Scenarios
Public‑cloud inference: Deploy AI services on Didi Cloud or other public clouds and attach IFX for accelerated serving.
Private‑cloud/on‑premise: Integrate IFX into enterprise data centers to improve throughput and reduce GPU spend.
IoT and edge: Use IFX on smart‑manufacturing, home automation, autonomous vehicles, robotics, and intelligent transportation devices where low latency is critical.
Future Directions
Internal tests indicate that IFX can achieve even lower latency and cost on P4 GPUs with further kernel and scheduling optimizations. The IFX team plans to publish detailed technical notes and expose the acceleration engine to external customers via standard APIs, enabling broader adoption of high‑performance inference.
Code example
推荐阅读
▬
更多推荐
▬
滴滴开源
/ Open Source
AoE
|
Delta
|
Mpx
|
Booster
|
Chameleon
|
DDMQ
|
DroidAssist
|
Rdebug
|
Doraemonkit
|
Kemon
|
Mand Moblie
|
virtualApk
|
获取更多项目
技术干货
/ Recommended article
WebPack 如何控制事件执行流
|
Android 性能优化之 Activity 启动耗时分析
|
HDFS 源码解读:HadoopRPC 实现细节的探究
|
阅读更多内容Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
