Tagged articles
110 articles
Page 2 of 2
TiPaiPai Technical Team
TiPaiPai Technical Team
Jun 25, 2021 · Artificial Intelligence

Mastering TensorRT: Deploy Deep Learning Models Efficiently

This article introduces TensorRT, explains its deployment workflow from model training to engine generation, shows how to register custom operators for ONNX and create TensorRT plugins, and explores deformable convolution (DCN) implementation strategies for high‑performance AI inference.

AI inferenceCUDACustom Operators
0 likes · 8 min read
Mastering TensorRT: Deploy Deep Learning Models Efficiently
iQIYI Technical Product Team
iQIYI Technical Product Team
May 28, 2021 · Artificial Intelligence

iQIYI GPU Virtual Sharing for AI Inference: Architecture, Isolation, and Scheduling

iQIYI created a custom GPU‑virtual‑sharing system that intercepts CUDA calls to enforce per‑container memory limits, rewrites kernel launches for compute isolation, and integrates with a Kubernetes scheduler extender, allowing multiple AI inference containers to share a single V100 with minimal overhead and more than doubling overall GPU utilization.

AI inferenceCUDAGPU virtualization
0 likes · 16 min read
iQIYI GPU Virtual Sharing for AI Inference: Architecture, Isolation, and Scheduling
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 22, 2021 · Artificial Intelligence

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, processing over 1.07 million images per second on 8 NVIDIA A100 GPUs, setting multiple first‑place records and dramatically improving e‑commerce recommendation speed and overall AI workload efficiency.

AI inferenceAlibaba CloudGPU Acceleration
0 likes · 7 min read
Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform
DataFunTalk
DataFunTalk
Aug 16, 2020 · Artificial Intelligence

IFX: Didi’s In‑House AI Inference Engine Platform – Architecture, Productization, and Performance

The article introduces Didi’s IFX platform, describing its background, four‑layer architecture (access, software, engine, compute), productization efforts such as high‑performance optimizations, model and engine compression, unified deployment across hardware, multi‑framework support, automation, and security enhancements, and concludes with future plans.

AI inferenceDidiSecurity
0 likes · 8 min read
IFX: Didi’s In‑House AI Inference Engine Platform – Architecture, Productization, and Performance
Didi Tech
Didi Tech
Aug 5, 2020 · Artificial Intelligence

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

DiDi’s IFX AI inference platform, built since 2018, uses a four‑layer architecture spanning access, software, engine, and compute to deliver cloud, edge, and device inference with high‑performance kernel optimizations, model and binary compression, uniform multi‑framework deployment, automated testing, and end‑to‑end security for billions of daily calls.

AI inferenceEdge ComputingPerformance Optimization
0 likes · 9 min read
DiDi IFX AI Inference Platform: Architecture, Performance, and Productization
Ctrip Technology
Ctrip Technology
Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferenceDeep LearningPerformance Optimization
0 likes · 15 min read
Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions
58 Tech
58 Tech
Mar 27, 2020 · Artificial Intelligence

dl_inference: Open‑Source General Deep Learning Inference Service

dl_inference is an open‑source inference platform that simplifies deployment of TensorFlow and PyTorch models in production, offering unified gRPC access, load‑balanced multi‑node serving, GPU/CPU options, customizable pre‑ and post‑processing, and extensible architecture for future AI workloads.

AI inferenceDeep LearningModel Serving
0 likes · 11 min read
dl_inference: Open‑Source General Deep Learning Inference Service
Didi Tech
Didi Tech
Aug 17, 2019 · Artificial Intelligence

Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference

Didi’s Elastic Inference Service (EIS) and its IFX AI acceleration engine provide a distributed, cost‑effective inference platform that automatically scales resources based on QPS and latency requirements, supports major deep‑learning frameworks, excels in public‑cloud, private‑cloud, IoT and edge scenarios, and achieved top‑rank DAWNBench latency and cost scores on ImageNet with P4 GPUs.

AI inferenceBenchmarkCloud AI
0 likes · 7 min read
Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference
Alibaba Cloud Developer
Alibaba Cloud Developer
Jun 11, 2019 · Artificial Intelligence

How ACE Powers Edge AI: A Heterogeneous Compute Engine for Real‑Time Inference

This article explains the design of ACE (AI Labs Compute Engine), a heterogeneous edge compute platform that combines model quantization, GPU/DSP/VPU acceleration, cloud‑edge model management, and custom algorithm integration to enable low‑latency AI services such as gesture, pet, and pen‑tip detection on resource‑constrained devices.

AI inferenceEdge ComputingEmbedded AI
0 likes · 13 min read
How ACE Powers Edge AI: A Heterogeneous Compute Engine for Real‑Time Inference
Tencent Architect
Tencent Architect
Oct 20, 2017 · Artificial Intelligence

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services

This article presents a comprehensive overview of a universal FPGA‑based CNN accelerator, detailing its motivation, flexible architecture, compiler workflow, memory and compute unit designs, and performance comparisons that demonstrate significant latency and cost advantages over CPU and GPU solutions for real‑time AI inference.

AI inferenceCNN accelerationFPGA
0 likes · 13 min read
Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services