IFX: Didi’s In‑House AI Inference Engine Platform – Architecture, Productization, and Performance
The article introduces Didi’s IFX platform, describing its background, four‑layer architecture (access, software, engine, compute), productization efforts such as high‑performance optimizations, model and engine compression, unified deployment across hardware, multi‑framework support, automation, and security enhancements, and concludes with future plans.
Background – With the rapid development of artificial‑intelligence technologies, deep‑learning has become pervasive in industry. Didi leverages massive ride‑hailing data, driver‑side devices, in‑car cameras, and GPU clusters to build a cloud‑edge‑device AI ecosystem. Since September 2018, the Didi Machine‑Learning team has built the self‑developed inference engine platform IFX, which went live internally in December 2018 and now serves millions of devices with daily call volumes exceeding ten trillion.
Architecture
Access Layer – Provides SDKs for local inference in various programming languages and standard service APIs (HTTP/Thrift/GRPC) for remote inference, along with authorization and telemetry for device and inference metrics.
Software Layer – Handles model parsing and management, offering model slimming, encryption, version control, and automated testing to ensure consistency between training and inference models and to evaluate performance on target hardware.
Engine Layer – Centralizes engine‑level optimizations: performance diagnostics, engine slimming and obfuscation, operator optimizations (low‑precision, graph, heterogeneous scheduling, assembly‑level auto‑tuning), and system‑level improvements such as scheduling, I/O, and pre/post‑processing.
Compute Layer – Supports a wide range of hardware (NVIDIA GPUs, ARM, x86, Cambricon, etc.) across cloud, edge, and device scenarios.
Productization
High Performance – Assembly‑level kernel optimizations and full‑stack (pre‑/post‑processing, network) improvements yield 40‑200% model speedups and 30‑260% service‑level gains.
Compactness – Model compression (<25% size reduction without accuracy loss) and binary ELF compression (~50% SDK size reduction) reduce app package size and improve user experience.
Uniformity – A single model can be deployed to diverse hardware platforms using a unified deployment scheme.
Multi‑Framework Support – IFX converts models from TensorFlow, PyTorch, Caffe, Darknet, etc., ensuring compatibility and smooth upgrades.
Automation – Automates SDK generation, service load testing, model correctness verification, and power/CPU‑load testing.
Security – Implements offline/online authorization, code obfuscation for iOS/Android/Linux, function‑level encryption in the engine, and model file encryption to protect AI assets.
Conclusion – IFX now powers many internal Didi services, yet several inefficiencies remain. The team plans to further automate the development‑to‑production pipeline, unify the development environment, and integrate testing, verification, analysis, and deployment processes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
