Tagged articles

110 articles

Page 2 of 2

Jun 25, 2021 · Artificial Intelligence

Mastering TensorRT: Deploy Deep Learning Models Efficiently

This article introduces TensorRT, explains its deployment workflow from model training to engine generation, shows how to register custom operators for ONNX and create TensorRT plugins, and explores deformable convolution (DCN) implementation strategies for high‑performance AI inference.

AI inferenceCUDACustom Operators

0 likes · 8 min read

Mastering TensorRT: Deploy Deep Learning Models Efficiently

iQIYI Technical Product Team

May 28, 2021 · Artificial Intelligence

iQIYI GPU Virtual Sharing for AI Inference: Architecture, Isolation, and Scheduling

iQIYI created a custom GPU‑virtual‑sharing system that intercepts CUDA calls to enforce per‑container memory limits, rewrites kernel launches for compute isolation, and integrates with a Kubernetes scheduler extender, allowing multiple AI inference containers to share a single V100 with minimal overhead and more than doubling overall GPU utilization.

AI inferenceCUDAGPU virtualization

0 likes · 16 min read

iQIYI GPU Virtual Sharing for AI Inference: Architecture, Isolation, and Scheduling

Alibaba Cloud Infrastructure

Apr 22, 2021 · Artificial Intelligence

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

Alibaba Cloud's Zhenduan heterogeneous computing acceleration platform achieved historic breakthroughs in the MLPerf inference benchmark, processing over 1.07 million images per second on 8 NVIDIA A100 GPUs, setting multiple first‑place records and dramatically improving e‑commerce recommendation speed and overall AI workload efficiency.

AI inferenceAlibaba CloudGPU Acceleration

0 likes · 7 min read

Alibaba Cloud Breaks MLPerf Inference Performance Records with Zhenduan Heterogeneous Computing Platform

DataFunTalk

Aug 16, 2020 · Artificial Intelligence

IFX: Didi’s In‑House AI Inference Engine Platform – Architecture, Productization, and Performance

The article introduces Didi’s IFX platform, describing its background, four‑layer architecture (access, software, engine, compute), productization efforts such as high‑performance optimizations, model and engine compression, unified deployment across hardware, multi‑framework support, automation, and security enhancements, and concludes with future plans.

AI inferenceDidiSecurity

0 likes · 8 min read

IFX: Didi’s In‑House AI Inference Engine Platform – Architecture, Productization, and Performance

Didi Tech

Aug 5, 2020 · Artificial Intelligence

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

DiDi’s IFX AI inference platform, built since 2018, uses a four‑layer architecture spanning access, software, engine, and compute to deliver cloud, edge, and device inference with high‑performance kernel optimizations, model and binary compression, uniform multi‑framework deployment, automated testing, and end‑to‑end security for billions of daily calls.

AI inferenceEdge ComputingPerformance Optimization

0 likes · 9 min read

DiDi IFX AI Inference Platform: Architecture, Performance, and Productization

Ctrip Technology

Jul 23, 2020 · Artificial Intelligence

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

This article examines the challenges of deep learning inference, outlines general optimization methodologies—including system-level and model-level techniques—presents practical case studies such as Transformer translation model improvements, and discusses future trends in automated compilation and performance tuning for AI services.

AI inferenceDeep LearningPerformance Optimization

0 likes · 15 min read

Inference Performance Optimization for AI Applications: Methods, Case Studies, and Future Directions

58 Tech

Mar 27, 2020 · Artificial Intelligence

dl_inference: Open‑Source General Deep Learning Inference Service

dl_inference is an open‑source inference platform that simplifies deployment of TensorFlow and PyTorch models in production, offering unified gRPC access, load‑balanced multi‑node serving, GPU/CPU options, customizable pre‑ and post‑processing, and extensible architecture for future AI workloads.

AI inferenceDeep LearningModel Serving

0 likes · 11 min read

dl_inference: Open‑Source General Deep Learning Inference Service

Didi Tech

Aug 17, 2019 · Artificial Intelligence

Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference

Didi’s Elastic Inference Service (EIS) and its IFX AI acceleration engine provide a distributed, cost‑effective inference platform that automatically scales resources based on QPS and latency requirements, supports major deep‑learning frameworks, excels in public‑cloud, private‑cloud, IoT and edge scenarios, and achieved top‑rank DAWNBench latency and cost scores on ImageNet with P4 GPUs.

AI inferenceBenchmarkCloud AI

0 likes · 7 min read

Didi’s Elastic Inference Service & IFX Engine: Achieving World‑Class AI Inference

Alibaba Cloud Developer

Jun 11, 2019 · Artificial Intelligence

How ACE Powers Edge AI: A Heterogeneous Compute Engine for Real‑Time Inference

This article explains the design of ACE (AI Labs Compute Engine), a heterogeneous edge compute platform that combines model quantization, GPU/DSP/VPU acceleration, cloud‑edge model management, and custom algorithm integration to enable low‑latency AI services such as gesture, pet, and pen‑tip detection on resource‑constrained devices.

AI inferenceEdge ComputingEmbedded AI

0 likes · 13 min read

How ACE Powers Edge AI: A Heterogeneous Compute Engine for Real‑Time Inference

Tencent Architect

Oct 20, 2017 · Artificial Intelligence

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services

This article presents a comprehensive overview of a universal FPGA‑based CNN accelerator, detailing its motivation, flexible architecture, compiler workflow, memory and compute unit designs, and performance comparisons that demonstrate significant latency and cost advantages over CPU and GPU solutions for real‑time AI inference.

AI inferenceCNN accelerationFPGA

0 likes · 13 min read

Design and Performance of a General‑Purpose FPGA CNN Accelerator for Real‑Time AI Services