Tagged articles
7 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
Apr 25, 2026 · Artificial Intelligence

Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test

This article walks through deploying DeepSeek‑V4‑Flash on a server with two NVIDIA H20 GPUs (96 GB each), detailing model download, Docker image preparation, launch script tweaks, memory compression via FP8 and expert parallelism, and reports observed concurrency limits and token‑per‑second speeds, including a test that disables the model's thinking mode.

DeepSeek-V4DockerFP8 quantization
0 likes · 6 min read
Deploying DeepSeek‑V4‑Flash Locally on 2 × NVIDIA H20 (96 GB) – Quick Performance Test
Meituan Technology Team
Meituan Technology Team
Mar 6, 2025 · Artificial Intelligence

INT8 Quantization and Inference Optimization of DeepSeek R1 Model

Meituan’s search and recommendation team converted the FP8‑only DeepSeek‑R1 model to INT8 by first casting weights to BF16 and then applying block‑wise or channel‑wise quantization, which preserves GSM8K and MMLU accuracy while delivering 33% to 50% higher throughput on A100‑80G GPUs, and they released the SGLang‑based inference scripts and quantized weights publicly, enabling deployment on older NVIDIA hardware without accuracy loss.

DeepSeek-R1GPU deploymentINT8 Quantization
0 likes · 11 min read
INT8 Quantization and Inference Optimization of DeepSeek R1 Model
Tencent Cloud Developer
Tencent Cloud Developer
Apr 20, 2023 · Artificial Intelligence

Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering

This comprehensive guide walks you through the hardware requirements, environment deployment, key parameters, prompt techniques, ControlNet integration, model download and installation, as well as style and character training for Stable Diffusion, providing practical code snippets and visual examples for each step.

AI image generationControlNetGPU deployment
0 likes · 38 min read
Master Stable Diffusion: From Hardware Setup to Advanced Prompt Engineering
DataFunSummit
DataFunSummit
Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRConversational AIGPU deployment
0 likes · 14 min read
Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT
Airbnb Technology Team
Airbnb Technology Team
Nov 11, 2021 · Artificial Intelligence

Airbnb’s Task‑Oriented Dialogue System for Mutual Cancellation: Architecture, Data Collection, Modeling, and Deployment

Airbnb’s ATIS task‑oriented dialogue system for Mutual Cancellation combines hierarchical domain classification, Q&A‑style intent annotation, large‑scale RoBERTa pre‑training with multilingual fine‑tuning, multi‑turn context handling, GPU‑accelerated inference, and contextual‑bandit reinforcement learning to deliver a scalable, efficient customer‑support solution.

AIGPU deploymentmultilingual
0 likes · 22 min read
Airbnb’s Task‑Oriented Dialogue System for Mutual Cancellation: Architecture, Data Collection, Modeling, and Deployment