Tag

heterogeneous architecture

1 views collected around this technical thread.

JD Retail Technology
JD Retail Technology
Jan 25, 2024 · Artificial Intelligence

Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration

This article describes how JD Retail's advertising technology team tackled the high‑compute demands of modern recommendation models by designing a distributed graph‑partitioned heterogeneous computing framework, introducing TensorBatch request aggregation, leveraging deep‑learning compiler bucketing and asynchronous compilation, and implementing a multi‑stream GPU architecture to dramatically improve online inference throughput and latency.

GPU accelerationOnline InferenceRecommendation systems
0 likes · 13 min read
Optimizing High‑Concurrency Online Inference for Recommendation Models with Distributed Heterogeneous Computing and GPU Acceleration