Redesigning Twemproxy with Nginx Multi‑Process Architecture for High‑Performance Caching
This article analyzes the limitations of native Twemproxy, describes how Nginx's master‑worker multi‑process model and related Linux kernel features were integrated to create a high‑performance, highly available cache proxy, and presents extensive online and benchmark results showing significant latency and QPS improvements.
The original open‑source cache proxies such as Twemproxy and Codis are single‑process, single‑threaded solutions that lack cluster support and cannot fully utilize multi‑core CPUs, resulting in low QPS and unstable latency.
Twemproxy, while fast and lightweight, suffers from several bottlenecks: a single master process, high CPU usage when short‑connection QPS exceeds 8 000, I/O blocking under heavy load, high maintenance cost for scaling, and difficulty upgrading.
Nginx, developed by Igor Sysoev, provides an event‑driven, master‑worker architecture that leverages epoll, CPU affinity, modular design, hot deployment, and low memory consumption, making it a proven high‑performance web server.
By adapting Nginx's master‑worker model to Twemproxy, a master process manages multiple worker processes, each handling client and backend events via non‑blocking accept locks, load‑balancing thresholds, and the reuseport feature; inter‑process communication is achieved with signals and a socketpair‑based channel, enabling hot configuration reloads and rapid worker recovery.
Additional network optimizations include enabling RSS (multiple NIC queues) and setting TCP_QUICKACK after recv() to reduce latency spikes.
Extensive online traffic measurements and controlled benchmark tests demonstrate that the redesigned Twemproxy reduces client‑side latency by roughly half when one proxy is replaced, and up to three‑fold when both are upgraded, while maintaining or improving QPS.
The final conclusions favor a multi‑process over a multi‑threaded design for reliability and ease of implementation, enumerate the new features (master‑worker, low‑version kernel support, quic_ack, reuseport, dynamic memory tuning, multi‑tenant support, etc.), and outline future work such as hot‑loading configuration files and code hot‑upgrade.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.