Backend Development 15 min read

How to Maximize System Load Capacity: Metrics, Bottlenecks, and Tuning Strategies

This article explains how to measure a system's load capacity, identifies key factors such as bandwidth, hardware, and OS settings, and provides practical optimization techniques for Linux, Nginx, Tomcat, and databases to achieve higher concurrency and better performance.

21CTO

Nov 2, 2015

How to Maximize System Load Capacity: Metrics, Bottlenecks, and Tuning Strategies

1. Measurement Metrics

The primary metric for load capacity is Requests per Second (RPS), which counts only successfully responded requests. As concurrent users increase, RPS rises until a tipping point where additional users cause RPS to drop and response time to increase; this point represents the system's maximum load.

2. Related Factors

Key factors influencing concurrency include bandwidth, hardware configuration, system configuration, application server configuration, and program logic. Bandwidth and hardware set the upper bound, while the other factors determine how close to that bound the system can operate.

2.1 Bandwidth

Bandwidth, measured in Mbps, determines the maximum data transmission speed, analogous to the width of a water pipe.

2.2 Hardware Configuration

Critical hardware parameters are CPU frequency/cores, memory size and speed, and disk speed (SSD vs. HDD). Upgrading these components directly raises the load ceiling.

2.3 System Configuration

On Linux, several kernel and limit settings affect load capacity:

File descriptor limits: /proc/sys/fs/file-max and per‑process limits via ulimit -n or /etc/security/limits.conf.

Process/thread limits: ulimit -u for processes, /proc/sys/kernel/threads-max for threads, and PTHREAD_THREADS_MAX for per‑process thread caps.

TCP kernel parameters: tuning net.ipv4.tcp_syncookies, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_tw_recycle, net.ipv4.tcp_fin_timeout, and others in /etc/sysctl.conf to reduce TIME_WAIT buildup and improve connection handling.

2.4 Application Server Configuration

Different concurrency models are used by servers:

Multi‑process (one process per request)

Prefork (process pool)

Worker (one thread per request)

Master/worker (event‑driven, non‑blocking I/O, used by Nginx)

2.4.1 Nginx/Tengine

Key settings include matching worker count to CPU cores, appropriate keep‑alive timeout, increasing worker_rlimit_nofile, and enabling HTTP/1.1 keep‑alive for upstream connections.

2.4.2 Tomcat

Tomcat tuning involves JVM options (heap sizes -Xms, -Xmx, young generation -Xmn, stack size -Xss) and connector parameters such as protocol (prefer apr), connectionTimeout, maxThreads, minSpareThreads, acceptCount, and maxConnection. Large‑memory Tomcat can suffer long GC pauses, while smaller Tomcat instances in a cluster improve scalability and fault tolerance.

2.4.3 Database

MySQL is the common relational database but can become a bottleneck at high load. Strategies include vertical/horizontal sharding, using Redis as a cache layer, and implementing read‑write separation with master/slave replication.

3. Typical Architecture

A common web stack is illustrated as: LVS + Nginx + Tomcat + MySQL + Redis.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend-development performance tuning Tomcat system load

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.