Tagged articles
27 articles
Page 1 of 1
Old Zhang's AI Learning
Old Zhang's AI Learning
May 13, 2026 · Artificial Intelligence

Why vLLM Now Leads Open‑Source LLM Inference Benchmarks

vLLM tops the Artificial Analysis ranking by delivering the highest throughput for DeepSeek V3.2, Qwen 3.5 397B, and MiniMax‑M2.5 on identical NVIDIA Blackwell Ultra hardware, thanks to extensive kernel‑fusion optimizations that remain in the main branch.

DeepSeekLLM inferenceQwen
0 likes · 7 min read
Why vLLM Now Leads Open‑Source LLM Inference Benchmarks
Machine Heart
Machine Heart
May 9, 2026 · Artificial Intelligence

Can QuantClaw Cut OpenClaw Costs by 21% and Speed Up Inference by 15%?

QuantClaw, an open‑source plug‑in for the OpenClaw AI agent framework, uses a systematic quantization study to dynamically route tasks to appropriate model precisions, achieving up to 21% cost reduction, 8‑15% latency improvement, and even higher task scores across diverse workloads.

AI agentsCost OptimizationModel Quantization
0 likes · 8 min read
Can QuantClaw Cut OpenClaw Costs by 21% and Speed Up Inference by 15%?
Code Mala Tang
Code Mala Tang
Apr 28, 2026 · Backend Development

Redis No Longer Dominates: Discover the Best Python Caching Alternatives

A benchmark of Redis, Memcached, DragonflyDB, and Cashews using the same FastAPI workload reveals that Redis falls behind on latency, throughput, and memory efficiency, while DragonflyDB and Cashews offer superior performance and developer experience for Python caching.

CashewsDragonflyDBMemcached
0 likes · 11 min read
Redis No Longer Dominates: Discover the Best Python Caching Alternatives
PaperAgent
PaperAgent
Apr 5, 2026 · Artificial Intelligence

Can AI Make Code Faster? Problem‑Oriented Optimization and Anchor Verification Breakthrough

A recent ICLR 2026 study from Zhejiang University, Ant Group, and Stony Brook introduces a problem‑oriented dataset and an anchor‑verification framework that enable large language models to not only generate correct code but also significantly improve its execution speed, achieving up to six‑fold acceleration while maintaining high correctness.

AI code generationCode Optimizationanchor verification
0 likes · 8 min read
Can AI Make Code Faster? Problem‑Oriented Optimization and Anchor Verification Breakthrough
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
Dec 23, 2025 · Artificial Intelligence

ClusterAttn: Compressing KV Cache with Intrinsic Attention Clustering

ClusterAttn tackles the KV‑cache bottleneck of large language models by exploiting the natural clustering of attention scores, achieving up to 92% compression without accuracy loss, boosting throughput 2.6–4.8×, handling 128K‑token sequences on a single GPU, and outperforming existing training‑free compression methods.

KV cache compressionattention clusteringdensity clustering
0 likes · 8 min read
ClusterAttn: Compressing KV Cache with Intrinsic Attention Clustering
Linux Kernel Journey
Linux Kernel Journey
Feb 27, 2025 · Cloud Native

Designing FUSE: From Kernel VFS to Userspace and JuiceFS Performance

This article explains the evolution of file system architecture from kernel‑level VFS to userspace via FUSE, reviews the historical role of NFS, details JuiceFS's implementation on top of FUSE, and presents benchmark results that demonstrate its high throughput and practical limitations.

FUSEJuiceFSLinux kernel
0 likes · 15 min read
Designing FUSE: From Kernel VFS to Userspace and JuiceFS Performance
AIWalker
AIWalker
Jan 14, 2025 · Artificial Intelligence

Pure 3×3 Convolutions for Image‑Generation Diffusion Models: The DiC Approach

The paper introduces DiC, a fully convolutional diffusion model that rethinks 3×3 convolutions, adds sparse skip connections, stage‑specific embeddings and conditional gating, and demonstrates superior FID/IS scores and faster inference compared to diffusion Transformers across multiple scales.

AIconvolutional networksdiffusion models
0 likes · 19 min read
Pure 3×3 Convolutions for Image‑Generation Diffusion Models: The DiC Approach
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 18, 2024 · Artificial Intelligence

Can GPU Graph Algorithms Boost Vector Search Performance by 10×?

This article explains how OpenSearch's GPU‑accelerated vector search leverages parallel graph algorithms to achieve up to tenfold speed improvements over CPU solutions, detailing ANNS techniques, performance benchmarks, and practical GPU specifications for high‑QPS AI applications.

GPU AccelerationOpenSearchapproximate nearest neighbor
0 likes · 11 min read
Can GPU Graph Algorithms Boost Vector Search Performance by 10×?
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 3, 2024 · Operations

How to Boost Logtail Multiline Log Collection Speed by Up to 7×

This article investigates why enabling line‑prefix regex for multiline logs slows Logtail down, explains the underlying regex matching mechanism, and demonstrates how switching from boost::regex_match to boost::regex_search with proper flags can dramatically improve collection throughput, achieving a seven‑fold speed increase.

boost::regexlog collectionlogtail
0 likes · 10 min read
How to Boost Logtail Multiline Log Collection Speed by Up to 7×
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Sep 16, 2024 · Artificial Intelligence

How TAG Makes LLM Inference Fully Asynchronous for Higher Throughput

With the growing complexity of LLM architectures like GQA, MLA, and MoE, runtime overhead has become a bottleneck; this article analyzes Python performance, communication costs, and synchronous execution in current inference frameworks, introduces the fully asynchronous TAG architecture, and demonstrates its superior throughput and latency through benchmarks.

GPU utilizationLLM inferenceRuntime Optimization
0 likes · 12 min read
How TAG Makes LLM Inference Fully Asynchronous for Higher Throughput
21CTO
21CTO
Apr 23, 2024 · Artificial Intelligence

Deploy Large Language Models with vLLM and Quantization for Low Latency

This guide explains how to deploy open‑source large language models using vLLM, benchmark latency and throughput, and apply 8‑bit/4‑bit quantization techniques such as BitsandBytes and NF4 to achieve faster inference on limited‑GPU hardware.

LLM deploymentPythonlarge language models
0 likes · 13 min read
Deploy Large Language Models with vLLM and Quantization for Low Latency
Aikesheng Open Source Community
Aikesheng Open Source Community
Nov 29, 2023 · Databases

When to Use Distributed vs. Centralized Databases: Analysis, Benchmarks, and Best Practices

This article examines the trade‑offs between centralized and distributed OLTP databases, presents industry usage statistics, performance benchmarks, practical questions for migration, and detailed guidance on sharding, SQL design, and operational considerations to help decide when a distributed solution is truly needed.

Database ArchitectureOLTPcentralized databases
0 likes · 12 min read
When to Use Distributed vs. Centralized Databases: Analysis, Benchmarks, and Best Practices
Architects' Tech Alliance
Architects' Tech Alliance
Aug 31, 2022 · Artificial Intelligence

Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server

This article presents a detailed benchmark of four Transformer models of varying sizes trained on the high‑end Inspur NF5488A5 GPU server, compares its NVSwitch‑based interconnect with a PCIe‑based system, and analyzes the impact of model scale, tensor parallelism, and hardware bandwidth on training efficiency.

DeepSpeedGPU serverMegatron-LM
0 likes · 12 min read
Performance Evaluation of Transformer Models on the Inspur NF5488A5 GPU Server
Baidu Tech Salon
Baidu Tech Salon
Jun 28, 2022 · Artificial Intelligence

How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends

The article presents a detailed technical review of Kunlun Chip's XPU‑R AI accelerator, covering its evolution from early FPGA prototypes to the current 7nm, 256 TOPS chip, the architectural choices that address AI workload demands, performance advantages over CPUs/GPUs, and the product ecosystem supporting diverse AI scenarios.

AI accelerationAI hardwareChip Design
0 likes · 20 min read
How Kunlun XPU‑R Redefines AI Compute: Architecture, Performance, and Future Trends
Code DAO
Code DAO
May 21, 2022 · Artificial Intelligence

How Quantization and Fusion Accelerate CNN Inference on Edge Devices

The article explains CNN inference optimization by applying PyTorch quantization and module‑fusion techniques, compares model size and latency before and after quantization, shows code for building, quantizing, and fusing a simple CNN, and presents benchmark results on CPU, highlighting a four‑fold size reduction and up to 1.7× speed‑up.

CNNPyTorchedge inference
0 likes · 11 min read
How Quantization and Fusion Accelerate CNN Inference on Edge Devices
Volcano Engine Developer Services
Volcano Engine Developer Services
Mar 16, 2022 · Artificial Intelligence

How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster

The article introduces Volcano Engine's veGiantModel, a high‑performance large‑model training framework built on PyTorch, Megatron and DeepSpeed, details its distributed parallel strategies, hardware setups, benchmark results showing up to 6.9× speedup over Megatron and DeepSpeed, and provides open‑source links for further use.

ByteCCLDistributed Traininglarge language models
0 likes · 6 min read
How veGiantModel Boosts Large Language Model Training Up to 6.9× Faster
FunTester
FunTester
Jul 5, 2021 · Industry Insights

Which Load Testing Tool Wins at 100k QPS? K6 vs Gatling vs FunTester Benchmarks

In a series of local benchmarks on a 2.6 GHz six‑core Intel i7 machine, the author compares K6, Gatling, and FunTester under 10 k to 20 k QPS loads, detailing CPU, memory, and response‑time metrics, analyzing script languages, JVM settings, and offering optimization suggestions for FunTester.

FunTesterGatlingJVM
0 likes · 11 min read
Which Load Testing Tool Wins at 100k QPS? K6 vs Gatling vs FunTester Benchmarks
NetEase Media Technology Team
NetEase Media Technology Team
Jan 15, 2021 · Backend Development

Go Language Practice and Ngo Framework Development at NetEase Media

Facing high memory usage and slow startup after containerizing its Java services, NetEase Media adopted Go in 2020, leveraging its fast compilation, low‑resource footprint and goroutine‑based concurrency to build the high‑performance Ngo framework, which outperforms Spring‑Boot in throughput while using far less memory.

Backend DevelopmentGo languageGoroutine
0 likes · 32 min read
Go Language Practice and Ngo Framework Development at NetEase Media
ITPUB
ITPUB
Dec 24, 2020 · Databases

How TDSQL Achieves Multi‑Level Strong Consistency with 4×‑3× Performance Gains

This article explains how Tencent's TDSQL database tackles the combined challenges of transaction and distributed consistency by introducing a multi‑level strong consistency model that delivers several‑fold performance improvements over Spanner, CockroachDB, and native Greenplum while preserving ACID guarantees.

Database ResearchDistributed TransactionsTDSQL
0 likes · 12 min read
How TDSQL Achieves Multi‑Level Strong Consistency with 4×‑3× Performance Gains
Xiao Lou's Tech Notes
Xiao Lou's Tech Notes
May 19, 2020 · Backend Development

Can You Build a Faster Counter Than Java’s LongAdder? A Deep Dive

An in‑depth Java performance study explores LongAdder, compares it with AtomicLong and lock‑based counters using JMH, and walks through successive custom implementations (V0‑V5) that apply striping, modulo optimization, false‑sharing elimination, and advanced hash probing to approach or surpass LongAdder’s throughput.

JMHJava concurrencyfalse sharing
0 likes · 16 min read
Can You Build a Faster Counter Than Java’s LongAdder? A Deep Dive
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Sep 3, 2019 · Big Data

QuickSQL: 360’s Unified Multi-Source Query Engine Explained

This article outlines how 360’s data center built QuickSQL, a federated SQL engine that unifies queries across heterogeneous sources such as Hive, MySQL, and Elasticsearch, detailing the business challenges, architectural design, performance benchmarks, and future roadmap for multi‑source data analysis.

Big DataData IntegrationSQL Engine
0 likes · 12 min read
QuickSQL: 360’s Unified Multi-Source Query Engine Explained
21CTO
21CTO
May 22, 2017 · Backend Development

Why Rewriting a Laravel App in Go Boosted Performance and Simplicity

The author rewrote a Laravel‑based Boxzilla application in Go, detailing migration steps, code‑size reduction, benchmark results, and testing advantages, showing how Go delivers faster response times, lower latency, and a more maintainable backend.

Code size reductionGoLaravel migration
0 likes · 7 min read
Why Rewriting a Laravel App in Go Boosted Performance and Simplicity