Design and Performance Optimization of a High‑Concurrency Volunteer Registration System
This article recounts the end‑to‑end design, testing, and iterative optimization of a volunteer registration platform that required tens of thousands of QPS, covering requirement analysis, middleware benchmarking, data partitioning, compression, connection‑limit handling, and final deployment lessons.
The project was a volunteer registration system that needed extremely high concurrency, with candidate‑side login and query interfaces requiring 40 k QPS, save‑volunteer interfaces 20 k TPS, and strict data accuracy.
Initial analysis identified concurrency and data correctness as the main challenges, leading to a proposed stack of Redis + RocketMQ + MySQL.
MySQL single‑node tests showed only ~5 k TPS and 12 k QPS, insufficient for the target, and master‑slave replication further reduced performance, so MySQL alone was ruled out.
Redis single‑node benchmarks demonstrated 100 k QPS for GET and 80 k TPS for SET, prompting the decision to cache all data in Redis while persisting asynchronously to MySQL via RocketMQ.
The save‑volunteer workflow was defined as: (1) start Redis transaction and update data, (2) RocketMQ synchronous disk write, (3) commit Redis transaction, (4) asynchronous MySQL write.
First stress test achieved data consistency but only ~4 k TPS, revealing that adding nodes alone did not solve the bottleneck.
Further profiling with Arthas showed the Redis update step was unexpectedly slow due to storing large JSON strings; the team switched to gzip‑compressed payloads, which improved throughput to ~8 k TPS.
When TPS still fell short of the 20 k target, the bottleneck shifted to RocketMQ’s synchronous disk write; horizontal scaling of RocketMQ clusters and Redis hash‑slot partitioning based on ID suffixes were introduced.
Later tests uncovered a 20‑30 % performance drop caused by Redis hash‑slot calculation on the master node; moving the slot calculation to the client reduced latency and raised QPS to ~20 k.
Subsequent bottlenecks were traced to Nginx bandwidth limits and Tomcat’s default max connections (~1 k), which capped overall TPS; increasing Java service instances and reducing response time from 300 ms to under 100 ms allowed TPS to approach 25 k.
Final optimizations removed unnecessary Redis transactions and locks, introduced pipelining for management‑side batch reads, and added monitoring‑driven failover scripts for Redis node failures.
The system was deployed with 8 candidate‑side Java nodes, 4 management nodes, 4 RocketMQ brokers, 4 Redis shards, a MySQL primary‑replica pair, and an Elasticsearch instance, achieving stable performance and meeting the high‑concurrency requirements.
Beyond the technical achievements, the author reflects on the importance of precise monitoring, iterative profiling, and balancing data accuracy with performance in real‑world backend projects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Selected Java Interview Questions
A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
