Tag

LLM deployment

0 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 30, 2025 · Cloud Native

Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration

This guide explains how to prepare, deploy, and verify the Qwen3‑8B large language model on an Alibaba Cloud Container Service for Kubernetes (ACK) cluster using ACS GPU resources, covering prerequisites, model download, storage setup, Kubernetes manifests, and testing the inference service.

ACSAckGPU
0 likes · 8 min read
Deploying Qwen3-8B Large Language Model on Alibaba Cloud ACK with ACS GPU Acceleration
Architecture Digest
Architecture Digest
Feb 6, 2025 · Artificial Intelligence

Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization

This guide explains how to deploy the full 671B DeepSeek R1 model on local hardware using Ollama, leveraging dynamic quantization to shrink model size, detailing hardware requirements, step‑by‑step installation, configuration, performance observations, and practical recommendations.

DeepSeekDynamic QuantizationGPU
0 likes · 12 min read
Deploying DeepSeek R1 671B Model Locally with Ollama and Dynamic Quantization
DataFunTalk
DataFunTalk
Jan 4, 2024 · Artificial Intelligence

Using OpenLLM to Quickly Build and Deploy Large Language Model Applications

This presentation explains how OpenLLM, an open‑source LLM framework, together with BentoML, addresses the challenges of deploying large language models by offering model switching, memory optimizations, multi‑GPU support, observability, and easy containerized deployment for production AI applications.

AI optimizationBentoMLLLM deployment
0 likes · 18 min read
Using OpenLLM to Quickly Build and Deploy Large Language Model Applications