Alibaba Cloud Native
Alibaba Cloud Native
Jan 17, 2024 · Artificial Intelligence

Boost LLM Inference with TensorRT‑LLM on Alibaba Cloud ACK: A Step‑by‑Step Guide

This article explains how TensorRT‑LLM accelerates large language model inference by applying quantization, in‑flight batching, advanced attention variants, and graph rewriting, and walks through a complete deployment on Alibaba Cloud Container Service (ACK) with environment setup, model compilation, benchmarking, and performance comparison.

Cloud Native AIIn‑Flight BatchingLLM inference
0 likes · 13 min read
Boost LLM Inference with TensorRT‑LLM on Alibaba Cloud ACK: A Step‑by‑Step Guide
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 14, 2022 · Artificial Intelligence

How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow

This article explains how cloud‑native AI leverages container‑based architectures and advanced scheduling algorithms—such as resource queues, gang scheduling, bin‑packing, GPU topology‑aware and Tor‑aware dispatch—to improve resource and engineering efficiency, and introduces Baidu’s AI workflow engine PaddleFlow with its design, features, and deployment options.

AI workflowCloud Native AIGPU virtualization
0 likes · 25 min read
How Cloud‑Native AI Boosts Resource Efficiency with PaddleFlow