Tagged articles

Throughput Optimization

3 articles · Page 1 of 1

Jun 29, 2026 · Artificial Intelligence

DeepSeek Opens DSpark: A New Speculative Decoding Framework for Large Language Models

DeepSeek releases DSpark, an open‑source speculative decoding system that combines semi‑autoregressive generation with confidence‑scheduled verification, delivering 60‑85% per‑user speed gains, lower latency, and superior acceptance rates compared with Eagle3 and DFlash across multiple LLM benchmarks.

Confidence SchedulingSemi-Autoregressive GenerationSpeculative Decoding

0 likes · 14 min read

DeepSeek Opens DSpark: A New Speculative Decoding Framework for Large Language Models

DevOps Coach

Apr 26, 2026 · Backend Development

Forget Kafka: A Lightweight Go Queue Achieves 2 Million Messages per Second

The article analyzes how replacing Kafka with a simple in‑memory Go queue reduced architectural complexity, boosted throughput from 240‑330 K to 1.8‑2.0 M messages per second, and clarified debugging, while still acknowledging scenarios where Kafka remains the better choice.

Backend PerformanceGoIn‑Memory Ring Buffer

0 likes · 8 min read

Forget Kafka: A Lightweight Go Queue Achieves 2 Million Messages per Second

Alibaba Cloud Big Data AI Platform

Sep 17, 2024 · Artificial Intelligence

Boosting LLM Inference: How NanoFlow Doubles Throughput

The article introduces NanoFlow, a novel service framework that leverages intra‑device parallelism, operation‑based pipelining, and async scheduling to significantly improve large language model serving throughput, achieving up to 1.91× higher performance while integrating with Alibaba Cloud PAI.

Alibaba Cloud PAIGPU SchedulingLLM serving

0 likes · 7 min read

Boosting LLM Inference: How NanoFlow Doubles Throughput

Throughput Optimization

DeepSeek Opens DSpark: A New Speculative Decoding Framework for Large Language Models

Forget Kafka: A Lightweight Go Queue Achieves 2 Million Messages per Second

Boosting LLM Inference: How NanoFlow Doubles Throughput

Forget Kafka: A Lightweight Go Queue Achieves 2 Million Messages per Second