Mastering System Load: Metrics, Bottlenecks, and Optimization Strategies
This article explains how to measure a system's load capacity, identifies key factors such as bandwidth, hardware, OS and application configurations, and provides practical optimization techniques for Linux, Tomcat, Nginx, MySQL, and Redis to handle high concurrency efficiently.
In the Internet era, high concurrency is a classic challenge; the number of requests a web site or app can handle at peak is a key performance indicator, as demonstrated by Alibaba’s Double‑11 traffic.
1. Measurement Metrics
The common metric is Requests per Second (RPS), which counts only successfully responded requests. As concurrent users increase, RPS rises until the server becomes saturated; beyond that point RPS drops and response time grows, marking the system’s maximum load capacity.
2. Related Factors
Bandwidth
Hardware configuration
System configuration
Application server configuration
Program logic
System architecture
2.1 Bandwidth
Bandwidth (Mbps) determines the data transmission speed and directly limits the system’s load capacity, similar to the diameter of a water pipe.
2.2 Hardware Configuration
Key hardware components include:
CPU frequency and core count – affect processing speed and thread scheduling.
Memory size and speed – larger and faster memory improves data handling.
Disk speed – SSDs provide much faster I/O than traditional HDDs.
2.3 System Configuration
Linux‑based back‑ends typically consider:
File descriptor limits – total system limit (/proc/sys/fs/file‑max) and per‑process limit (ulimit -n or /etc/security/limits.conf).
Process/thread limits – max processes (ulimit -u, noproc) and max threads (kernel.threads‑max, PTHREAD_THREADS_MAX).
TCP kernel parameters – tuning /etc/sysctl.conf to improve connection handling.
echo 1000000 > /proc/sys/fs/file-max # temporary fs.file-max = 1000000 # permanent in /etc/sysctl.conf
When adjusting file descriptor limits, ensure:
Total open descriptors ≤ /proc/sys/fs/file‑max.
Per‑process soft limit ≤ nofile soft limit.
Soft limit ≤ hard limit, and hard limit ≤ /proc/sys/fs/nr_open.
2.3.2 Process/Thread Limits
Maximum processes per user: ulimit -u or noproc in limits.conf.
Maximum threads: view with cat /proc/sys/kernel/threads-max; per‑process limit tied to PTHREAD_THREADS_MAX.
Thread stack size influences how many threads can be created ( ulimit -s).
2.3.3 TCP Kernel Parameters
Typical tuning includes: net.ipv4.tcp_syncookies = 1 – enable SYN cookies. net.ipv4.tcp_tw_reuse = 1 – allow reuse of TIME‑WAIT sockets. net.ipv4.tcp_tw_recycle = 1 – fast recycle of TIME‑WAIT (caution with NAT/LVS). net.ipv4.tcp_fin_timeout = 30 – shorten FIN timeout.
Additional adjustments for port range and connection limits:
net.ipv4.tcp_keepalive_time = 1200 net.ipv4.ip_local_port_range = 10000 65000 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_max_tw_buckets = 50002.4 Application Server Configuration
Common concurrency models:
Multi‑process – one process per request.
Prefork – pre‑forked process pool.
Worker – one thread per request.
Master/Worker – non‑blocking I/O with event‑driven workers (used by Nginx).
2.4.1 Nginx/Tengine
Key tuning points:
Set worker processes to match CPU cores.
Adjust keepalive timeout appropriately.
Increase worker_rlimit_nofile for more file descriptors.
Enable HTTP/1.1 keepalive in upstream.
2.4.2 Tomcat
Two main configuration areas:
JVM parameters – -Xms, -Xmx, -Xmn, -XX:PermSize, -XX:MaxPermSize, -Xss or -XX:ThreadStackSize.
Connector parameters – protocol (bio/nio/apr, prefer apr), connectionTimeout, maxThreads, minSpareThreads, acceptCount, maxConnection (for nio/apr).
Increasing threads or connections indiscriminately can degrade performance due to CPU scheduling overhead and memory consumption; a balanced setting is essential.
2.5 System Architecture
2.5.1 Load Balancing
Two categories:
Hardware load balancers (e.g., F5) – high performance but costly.
Software load balancers – layer‑4 (LVS) and layer‑7 (Nginx, HAProxy). LVS modes: NAT, DR, IP‑TUNNEL.
2.5.2 Sync vs Async
Choosing synchronous or asynchronous processing can trade off response time against consistency; asynchronous handling of non‑critical steps often improves throughput.
2.5.3 The 28 Principle
Roughly 20% of features generate 80% of traffic; focus optimization on the high‑traffic 20% while keeping the rest simple.
3. Typical Architecture
A common Java backend stack looks like:
The dashed part represents the database layer, usually a master‑slave setup, which can be replaced by Redis clusters, MySQL clusters (Cobar, etc.), or other distributed solutions.
Author: 飒然Hang, architect/backend engineer, working@中华万年历 Source: http://superhj1987.github.com
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
