Tag

Distributed Inference

0 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 16, 2025 · Artificial Intelligence

Optimizing Multi‑Node Distributed LLM Inference with ACK Gateway and vLLM

This article presents a step‑by‑step guide for deploying and optimizing large‑language‑model inference across multiple GPU‑enabled nodes using ACK Gateway with Inference Extension, vLLM’s tensor‑ and pipeline‑parallel techniques, and Kubernetes resources such as LeaderWorkerSet, PVCs, and custom routing policies, followed by performance benchmarking and analysis.

ACK GatewayDistributed InferenceKubernetes
0 likes · 19 min read
Optimizing Multi‑Node Distributed LLM Inference with ACK Gateway and vLLM
ByteDance Cloud Native
ByteDance Cloud Native
Mar 20, 2025 · Artificial Intelligence

How to Deploy DeepSeek‑R1 671B on AIBrix: Multi‑Node GPU Inference in Hours

This guide explains how to use the AIBrix distributed inference platform to deploy the massive DeepSeek‑R1 671B model across multiple GPU nodes, covering cluster setup, custom vLLM images, storage options, RDMA networking, autoscaling, request handling, and observability, turning a weeks‑long deployment into an hour‑scale process.

AIBrixDeepSeek-R1Distributed Inference
0 likes · 14 min read
How to Deploy DeepSeek‑R1 671B on AIBrix: Multi‑Node GPU Inference in Hours
DeWu Technology
DeWu Technology
Feb 17, 2025 · Artificial Intelligence

Optimizing Large Model Inference: High‑Performance Frameworks and Techniques

The article reviews high‑performance inference strategies for large language models such as Deepseek‑R1, detailing CPU‑GPU process separation, Paged and Radix Attention, Chunked Prefill, output‑length reduction, tensor‑parallel multi‑GPU scaling, and speculative decoding, each shown to markedly boost throughput and cut latency in real deployments.

AIDistributed InferenceGPU Acceleration
0 likes · 22 min read
Optimizing Large Model Inference: High‑Performance Frameworks and Techniques
IT Services Circle
IT Services Circle
Feb 7, 2025 · Artificial Intelligence

Building Low‑Cost AI Clusters with Old Phones Using Exo and Open WebUI

This article introduces Exo, an open‑source platform that lets you turn idle smartphones, tablets, and laptops into a distributed AI cluster capable of running large language models, and shows how Open WebUI provides a user‑friendly interface for deploying private AI assistants.

AI clusteringDistributed InferenceExo
0 likes · 6 min read
Building Low‑Cost AI Clusters with Old Phones Using Exo and Open WebUI
DataFunTalk
DataFunTalk
Jul 8, 2023 · Big Data

Key Technologies and Applications of Semantic Knowledge Management in Ant Financial Knowledge Graph Platform

This article presents Ant Group's large‑scale financial knowledge graph platform, detailing its semantic knowledge representation, hybrid graph model, distributed management architecture, core capabilities such as knowledge evolution and cross‑domain fusion, and showcases applications like anti‑fraud capital‑flow analysis and future DataFabric‑oriented knowledge sharing.

Big DataDistributed Inferencedata fabric
0 likes · 18 min read
Key Technologies and Applications of Semantic Knowledge Management in Ant Financial Knowledge Graph Platform
YunZhu Net Technology Team
YunZhu Net Technology Team
Oct 22, 2021 · Artificial Intelligence

Deep Learning Overview and Introduction to the Lightweight Distributed Inference Engine Avior

This article reviews deep learning and AI frameworks, highlights challenges of online model serving, and presents Avior—a lightweight, distributed inference engine designed for high‑performance AI services, detailing its architecture, layer design, benchmark results, and future development plans.

AI frameworksAviorDistributed Inference
0 likes · 8 min read
Deep Learning Overview and Introduction to the Lightweight Distributed Inference Engine Avior