Backend Development 17 min read

Redesigning Twemproxy with Nginx Multi‑Process Architecture for High‑Performance Caching

This article analyzes the limitations of native Twemproxy, describes how Nginx's master‑worker multi‑process model and related Linux kernel features were integrated to create a high‑performance, highly available cache proxy, and presents extensive online and benchmark results showing significant latency and QPS improvements.

Architecture Digest
Architecture Digest
Architecture Digest
Redesigning Twemproxy with Nginx Multi‑Process Architecture for High‑Performance Caching

The original open‑source cache proxies such as Twemproxy and Codis are single‑process, single‑threaded solutions that lack cluster support and cannot fully utilize multi‑core CPUs, resulting in low QPS and unstable latency.

Twemproxy, while fast and lightweight, suffers from several bottlenecks: a single master process, high CPU usage when short‑connection QPS exceeds 8 000, I/O blocking under heavy load, high maintenance cost for scaling, and difficulty upgrading.

Nginx, developed by Igor Sysoev, provides an event‑driven, master‑worker architecture that leverages epoll, CPU affinity, modular design, hot deployment, and low memory consumption, making it a proven high‑performance web server.

By adapting Nginx's master‑worker model to Twemproxy, a master process manages multiple worker processes, each handling client and backend events via non‑blocking accept locks, load‑balancing thresholds, and the reuseport feature; inter‑process communication is achieved with signals and a socketpair‑based channel, enabling hot configuration reloads and rapid worker recovery.

Additional network optimizations include enabling RSS (multiple NIC queues) and setting TCP_QUICKACK after recv() to reduce latency spikes.

Extensive online traffic measurements and controlled benchmark tests demonstrate that the redesigned Twemproxy reduces client‑side latency by roughly half when one proxy is replaced, and up to three‑fold when both are upgraded, while maintaining or improving QPS.

The final conclusions favor a multi‑process over a multi‑threaded design for reliability and ease of implementation, enumerate the new features (master‑worker, low‑version kernel support, quic_ack, reuseport, dynamic memory tuning, multi‑tenant support, etc.), and outline future work such as hot‑loading configuration files and code hot‑upgrade.

PerformanceCachingMulti-Processnetwork optimizationnginxTwemproxy
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.