Tagged articles

Semantic Cache

4 articles · Page 1 of 1
Su San Talks Tech
Su San Talks Tech
May 11, 2026 · Artificial Intelligence

Designing a Production‑Ready LLM Gateway: Architecture, Routing, Fallback, and Observability

This article outlines a production‑grade LLM Gateway design, detailing a three‑layer architecture, capability‑, cost‑, latency‑ and semantic‑based routing strategies, multi‑level fallback mechanisms, specialized load balancing, unified API adaptation, semantic caching, observability, and compares popular open‑source implementations.

FallbackLLMObservability
0 likes · 17 min read
Designing a Production‑Ready LLM Gateway: Architecture, Routing, Fallback, and Observability
Linyb Geek Road
Linyb Geek Road
May 5, 2026 · Artificial Intelligence

Optimizing Retrieval and Generation Latency in High‑Concurrency RAG Agents

The article dissects latency in high‑concurrency RAG Agent pipelines, showing how retrieval, re‑ranking, and LLM generation each contribute milliseconds of delay, and presents system‑level tactics—from ANN index tuning and partitioned search to vLLM PagedAttention, continuous batching, speculative decoding, model quantization, routing, semantic caching, and pipeline parallelism—to dramatically cut end‑to‑end response time.

ANNLLMRAG
0 likes · 15 min read
Optimizing Retrieval and Generation Latency in High‑Concurrency RAG Agents
Linyb Geek Road
Linyb Geek Road
Apr 27, 2026 · Artificial Intelligence

Designing a Production LLM Gateway: Architecture, Routing, and Fallback

The article outlines a production‑grade LLM Gateway architecture divided into ingress, decision, and egress layers, detailing capability‑based, cost‑aware, latency‑aware, and semantic routing, multi‑stage fallback mechanisms, specialized load‑balancing, protocol unification, semantic caching, observability, and evaluates open‑source solutions such as LiteLLM, RouteLLM, and Portkey.

FallbackLLM GatewayObservability
0 likes · 18 min read
Designing a Production LLM Gateway: Architecture, Routing, and Fallback