Operations 16 min read

How to Accurately Evaluate and Guarantee System Capacity for High‑Traffic Services

This article explains why capacity assessment and guarantee are essential for high‑traffic services, outlines the core factors influencing system capacity such as thread count, response time, CPU, memory and database resources, presents calculation formulas, describes load‑testing methods, shares practical benchmark results for Tomcat and Undertow, and offers actionable recommendations for improving throughput and stability.

Huolala Tech
Huolala Tech
Huolala Tech
How to Accurately Evaluate and Guarantee System Capacity for High‑Traffic Services
"Under a noble goal, keep working, even slowly, and success will come." — Einstein

1. Preface

The goals of capacity evaluation and guarantee are twofold: Ensure services remain functional for a massive number of users and drivers during peak traffic (e.g., national holiday surge). Provide sufficient redundancy to buy time during unexpected traffic spikes, requiring an objective assessment of current service limits and redundancy. "Know yourself and know the enemy, and you will never be defeated"

2. Background

During system maintenance we may face the following questions:

Can the application handle the projected peak traffic and transition smoothly? Is pre‑scaling required?

Does the application have performance bottlenecks?

Can the application support business complexity and sudden spikes?

3. Understanding Capacity Evaluation and Guarantee

3.1 What Is System Capacity?

System capacity is the maximum correct business volume (QPS) the system can sustain per unit time. When actual load exceeds this limit two main problems may appear:

Upstream requests queue, CPU usage stays low, but response time grows and the service becomes unavailable.

System resources are exhausted (CPU hits 100%, memory spikes), causing immediate failure. This scenario is easier to detect because CPU saturation appears before total collapse.

Evaluating capacity early helps discover and fix bottlenecks before they cause outages.

3.2 Capacity Guarantee

Guarantee means using tools and methods to ensure the system can handle the expected load, while providing redundancy and elasticity (auto‑scaling and self‑healing) for unexpected spikes.

3.3 Challenges of Capacity Guarantee

Uncertainty of traffic spikes due to complex business flows.

Complexity of evaluating capacity in a micro‑service, Kubernetes‑based architecture.

Precision of capacity testing: full‑link pressure tests, single‑instance tests, fault‑injection drills.

Continuous nature of capacity planning as product iterations and code changes occur.

3.4 Practical Methods

Based on experience at Huolala, the following methods are used for peak‑capacity guarantee:

4. How to Evaluate Capacity

4.1 Core Factors Influencing System Capacity (JVM Applications)

Container threads : More threads increase concurrent processing. Example: Tomcat default worker threads = 200, so a single instance can handle ~200 QPS if each request takes 1 s.

Response time : Shorter RT yields higher throughput. 1 s × 200 threads = 200 QPS; 200 ms × 200 threads = 1000 QPS (5× increase).

Dependent resources : Database connections and downstream services. Database connection pool size limits SQL QPS; downstream services may become bottlenecks.

Computation cost : CPU consumption of complex logic; can be discovered via load tests.

Container CPU & memory : Monitored as health indicators; optimization should focus on reducing CPU/memory pressure.

i. 256 * (1000 / 25) = 10240  // Example: 256 threads, 25 ms avg RT → 10 240 QPS
i. 200 * (1000 / 10) = 20000  // Example: 200 DB connections, 10 ms avg RT → 20 000 QPS

Introduce a safety threshold to evaluate overall system safety.

For Service A (4 machines, peak QPS ≈ 3000, DB QPS ≈ 7000):

Business container safety factor = 4 × 10240 / 3000 ≈ 13.6

Database safety factor = 4 × 20000 / 7000 ≈ 11.2

The theoretical capacity is >10× the current peak; the database is likely to hit the bottleneck first. CPU and memory are not considered here but will become limiting before these thresholds.

4.2 Evaluation Methods

4.2.1 Criteria for Reaching the Limit

Response time stabilizes (no further increase).

CPU approaches 100%.

Both conditions must be met to claim the service has reached its true capacity.

4.2.2 Three Load‑Testing Approaches

Single‑interface test : Focus on a core endpoint; simple but may have large error.

Core interface + DB test : Include all critical endpoints and all involved databases; higher cost but more accurate.

Real‑traffic test : Replicate production traffic ratios across all services; most accurate but expensive.

4.2.3 Single‑Interface Test Configuration

Key parameters:

Number of concurrent threads.

Timeout settings (no timeout vs. fixed timeout, e.g., 600 ms).

Observed behaviors:

No timeout : RT may increase dramatically, CPU stabilizes; eventual bottleneck appears as queueing.

Timeout 600 ms : System initially responds, then degrades; possible memory exhaustion or CPU saturation.

5. Practical Insights

5.1 Benchmark Results for Tomcat and Undertow

Various configurations of io‑threads, worker‑threads, and DB connection pools were tested. Key observations:

Undertow with default worker‑threads (8) quickly crashes under load.

Increasing worker‑threads to 256 improves peak QPS but still shows instability.

Combining high worker‑threads (256) with a larger DB pool (200) yields stable performance up to 3000 QPS with <1% error rate.

Tomcat (maxThread 200) consistently delivers higher throughput than Undertow under the same hardware.

5.2 Basic Conclusion

When worker‑threads are low and business interfaces lack sufficient resources, overall throughput stabilizes around 850 QPS for Undertow and slightly below 900 QPS for Tomcat; exceeding this threshold causes request queuing and increased RT.

5.3 Core Changes

Raising worker‑threads alone does not increase throughput but raises request latency.

Increasing both worker‑threads and DB connection pool size raises single‑machine throughput beyond 2000 QPS.

5.4 Recommended Actions

For I/O‑intensive applications, increase worker‑threads (or custom thread pools) to improve request‑handling capacity; monitor CPU and RT to intervene before overload.

Adjust DB connection pool size for load‑test scenarios; however, raising worker‑threads may expose RT issues, so treat it as a tuning step rather than a default.

Overall, with identical hardware and external conditions, Tomcat demonstrates higher throughput than Undertow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance Optimizationsystem reliabilityLoad Testingbackend services
Huolala Tech
Written by

Huolala Tech

Technology reshapes logistics

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.