Tagged articles

Inference Serving

2 articles · Page 1 of 1

Jul 5, 2026 · Artificial Intelligence

Is the Router’s Role Underrated? How vLLM Turns a Single Call into a Model Collaboration Squad

The article analyzes how routers have evolved from simple request forwarders into intelligent orchestrators that manage cost, safety, and cloud‑edge collaboration, detailing vLLM’s Semantic Router, its micro‑agent loop patterns, experimental benchmarks, and the resulting hybrid model serving architecture.

AI RoutingInference ServingMicro-Agents

0 likes · 13 min read

Is the Router’s Role Underrated? How vLLM Turns a Single Call into a Model Collaboration Squad

Zuoyebang Tech Team

Nov 17, 2022 · Artificial Intelligence

Scaling Deep Learning Model Serving: High‑Concurrency, Low‑Latency Solutions

This article examines the challenges of deploying dozens of deep‑learning models at Zuoyebang and compares three serving architectures—Gunicorn + Flask + Transformers, Tornado + PyTorch, and Tornado + Triton—highlighting performance trade‑offs and presenting a final high‑concurrency, low‑latency solution in production.

Deep LearningInference ServingLow latency

0 likes · 11 min read

Scaling Deep Learning Model Serving: High‑Concurrency, Low‑Latency Solutions