Tagged articles

KServe

8 articles · Page 1 of 1

Feb 10, 2025 · Artificial Intelligence

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

This article presents a hybrid‑cloud solution that uses ACK Edge and KServe to dynamically allocate on‑premise and cloud GPU resources for large‑language‑model inference, addressing tidal traffic patterns, reducing costs, and ensuring high availability through elastic scaling and custom scheduling policies.

ACK@EdgeAuto ScalingHybrid Cloud

0 likes · 13 min read

Hybrid Cloud Elastic LLM Inference Solution with ACK Edge and KServe

Alibaba Cloud Infrastructure

Feb 8, 2025 · Artificial Intelligence

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

This guide explains how to deploy a production‑ready DeepSeek‑R1 inference service on Alibaba Cloud ACK using KServe, covering model preparation, storage configuration, service deployment, observability, autoscaling, model acceleration, gray‑release and GPU‑shared inference.

DeepSeekGPUKServe

0 likes · 13 min read

Deploying a Production‑Ready DeepSeek‑R1 Inference Service on Alibaba Cloud ACK with KServe

Alibaba Cloud Infrastructure

Sep 5, 2024 · Artificial Intelligence

Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide

This guide explains how to quickly build a high‑performance, observable, and elastically scalable LLM inference service by deploying NVIDIA NIM on an Alibaba Cloud ACK cluster using the Cloud‑Native AI Suite, KServe, Prometheus, Grafana, and custom autoscaling based on request‑queue metrics.

Alibaba Cloud ACKKServeLLM Inference

0 likes · 15 min read

Deploying NVIDIA NIM on Alibaba Cloud ACK with Cloud‑Native AI Suite: A Step‑by‑Step Guide

Alibaba Cloud Native

Sep 4, 2024 · Cloud Native

Deploy NVIDIA NIM LLM Inference on Alibaba Cloud ACK with Auto‑Scaling and Monitoring

This guide walks you through deploying NVIDIA NIM for LLM inference on Alibaba Cloud ACK, integrating the Cloud Native AI Suite, configuring KServe, setting up Prometheus and Grafana monitoring, and implementing custom autoscaling based on request queue metrics.

ACKKServeLLM

0 likes · 15 min read

Deploy NVIDIA NIM LLM Inference on Alibaba Cloud ACK with Auto‑Scaling and Monitoring

Alibaba Cloud Native

Jun 29, 2024 · Cloud Native

Deploy TensorRT‑LLM Optimized Llama‑2 on KServe with Alibaba Cloud ASM

This guide walks through enabling KServe on Alibaba Cloud ASM, preparing the Llama‑2‑7B model with TensorRT‑LLM, creating the necessary Kubernetes resources, and deploying a serverless AI inference service that can be queried via a simple curl request.

AI inferenceKServeLLM

0 likes · 14 min read

Deploy TensorRT‑LLM Optimized Llama‑2 on KServe with Alibaba Cloud ASM

Alibaba Cloud Infrastructure

Jun 12, 2024 · Artificial Intelligence

Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide

This tutorial walks through deploying the Llama‑2‑7b‑hf model on Alibaba Cloud Kubernetes (ACK) using KServe, Triton Inference Server with the TensorRT‑LLM backend, covering prerequisites, model preparation, YAML configuration, PV/PVC setup, runtime creation, and troubleshooting steps.

AI inferenceKServeLlama 2

0 likes · 13 min read

Deploy Llama‑2 on ACK with KServe, Triton, and TensorRT‑LLM – Step‑by‑Step Guide

Alibaba Cloud Native

Nov 22, 2023 · Cloud Native

Build a Sidecarless AI Application with Alibaba Cloud Service Mesh ASM – Step‑by‑Step Guide

This guide walks you through creating a sidecarless AI demo on Alibaba Cloud Service Mesh ASM, covering environment setup, multi‑model serving with KServe, PVC storage, InferenceService configuration, business service deployment, gateway and waypoint creation, traffic routing rules, and OIDC single sign‑on integration.

AIASMKServe

0 likes · 28 min read

Build a Sidecarless AI Application with Alibaba Cloud Service Mesh ASM – Step‑by‑Step Guide

Alibaba Cloud Native

Jun 23, 2023 · Cloud Native

Accelerating LLM Inference on Alibaba Cloud with KServe and Fluid

This guide explains how to deploy large language models on Alibaba Cloud's ACK using KServe for serverless inference, integrates Fluid for distributed data caching to cut cold‑start latency, provides step‑by‑step commands, performance benchmarks, and practical tips for production‑grade AI model serving.

Cloud NativeFluidKServe

0 likes · 22 min read

Accelerating LLM Inference on Alibaba Cloud with KServe and Fluid