Tag

GPU deployment

0 views collected around this technical thread.

Ops Development Stories
Ops Development Stories
Jun 15, 2025 · Artificial Intelligence

How to Deploy vLLM for Fast LLM Inference on GPU and CPU – A Step‑by‑Step Guide

This article walks through deploying the high‑performance vLLM LLM inference framework, covering GPU and CPU backend installation, environment setup, offline and online serving, API usage, and a performance comparison that highlights the ten‑fold speed advantage of GPU over CPU.

CPU deploymentGPU deploymentLLM inference
0 likes · 38 min read
How to Deploy vLLM for Fast LLM Inference on GPU and CPU – A Step‑by‑Step Guide
DataFunSummit
DataFunSummit
Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRGPU deploymentSpeech AI
0 likes · 14 min read
Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT
Airbnb Technology Team
Airbnb Technology Team
Nov 11, 2021 · Artificial Intelligence

Airbnb’s Task‑Oriented Dialogue System for Mutual Cancellation: Architecture, Data Collection, Modeling, and Deployment

Airbnb’s ATIS task‑oriented dialogue system for Mutual Cancellation combines hierarchical domain classification, Q&A‑style intent annotation, large‑scale RoBERTa pre‑training with multilingual fine‑tuning, multi‑turn context handling, GPU‑accelerated inference, and contextual‑bandit reinforcement learning to deliver a scalable, efficient customer‑support solution.

AICustomer SupportGPU deployment
0 likes · 22 min read
Airbnb’s Task‑Oriented Dialogue System for Mutual Cancellation: Architecture, Data Collection, Modeling, and Deployment