How to Design System Capacity: From Real-World Event Planning to QPS Estimation
This article explains how to assess and design system capacity by translating a real‑world sports event scenario into concrete metrics such as daily visits, average and peak QPS, concurrency, and instance limits, while outlining practical steps, formulas, and a book‑reservation case study.
Background
Every year our organization holds a sports meet with a 2000 m race. Typically 40 men and 20 women register, and only ten runners can compete simultaneously, requiring at least six heats. Each heat lasts 30 minutes (20 minutes race time plus preparation and cleanup).
When the 4000 m race was cancelled this year, the 2000 m registrations jumped by 50, exceeding the original capacity and forcing half of the participants to race the following weekend, causing complaints.
This story illustrates the importance of capacity design: when business demand changes, failing to anticipate the impact leads to disruption.
Concept
Design capacity is the technical process of estimating system capacity using strategic analysis; it is a core skill for architects.
Capacity design requires concrete data such as data volume, concurrency, bandwidth, registered and active user counts, message size, image size, storage, CPU, and memory.
We will use concurrency as an example to demonstrate the analysis.
Analysis Process
Understanding Key Metrics
TPS (Transactions Per Second) measures transaction throughput.
QPS (Queries Per Second) measures request throughput.
Concurrency is the number of simultaneous requests a system can handle.
Peak QPS calculation: 80% of daily traffic occurs in 20% of the time (the peak window).
Formula: (Total PV × 80%) / (Seconds per day × 20%) = Peak QPS PV = page views, UV = unique visitors, throughput = processed requests per unit time, RT = average response time.
Relationship: QPS = Concurrency / Average RT and
Concurrency = QPS × Average RTWhen to Evaluate System Capacity
1. Temporary traffic spikes (e.g., 618, Double‑11, holiday promotions).
2. Initial system capacity assessment before launch.
3. Changes in capacity baseline as features grow, data volume increases, or daily active users rise.
Capacity evaluation includes data volume, concurrency, bandwidth, CPU, memory, and disk.
Evaluation Steps
1. Analyze Daily Total Visits
Gather realistic daily PV/UV numbers from product or operations, or estimate for a new system.
2. Estimate Average QPS
Assume active hours of ~11 hours (≈40 000 seconds). Average QPS = Daily visits ÷ 40 000.
3. Estimate Peak QPS
Use traffic curves or the 80/20 rule. Example: daily QPS 2 900, peak ≈ 2.58 × average = 7 482 QPS.
4. Determine Single‑Instance QPS Limit
Conduct load testing (e.g., nGrinder, JMeter). Our standard: response time > 2 s indicates a bottleneck; we aim for ≤ 1 s, so we adjust the limit accordingly.
5. Confirm with Redundancy
If peak QPS is 7 500 and a single web instance can safely handle 2 000 QPS, at least four instances are needed.
Case Study: Book Reservation System
Using the 80/20 rule, a 9‑hour peak window (32 400 seconds) and total PV = 1 500 000 yields peak QPS ≈ 185. Concurrency = QPS × Avg RT (0.5 s) ≈ 92.5, rounded to 100, then adjusted to 200 for testing.
Performance testing confirmed the system supports > 200 concurrent users with response times of 50–100 ms.
Summary
System capacity design should be performed during temporary traffic spikes, initial launch assessments, and when the baseline changes due to growth.
The steps are: analyze daily visits, calculate average QPS, estimate peak QPS (traffic curve or 80/20 rule), perform load testing to find instance limits, and adjust based on redundancy.
Applying these steps to the opening sports event example shows that early capacity re‑evaluation could have prevented the scheduling issues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
