Operations 16 min read

How to Measure and Optimize System Load Capacity for High‑Concurrency Services

This article explains what system load capacity is, how to measure it using requests per second, identifies key factors such as bandwidth, hardware, OS and server configurations, and provides practical tuning steps for Linux, TCP, Nginx, Tomcat and databases to improve scalability.

MaGe Linux Operations

Oct 28, 2015

How to Measure and Optimize System Load Capacity for High‑Concurrency Services

System Load Capacity Overview

In the Internet era, high concurrency is a common challenge. Whether for a web site or an app, the number of concurrent requests a system can handle at peak is a key performance indicator. Examples like Alibaba's Double‑11 event illustrate the importance of robust load handling.

1. Measurement Metrics

The primary metric is Requests per Second, which counts successful responses per second. Increasing concurrent users raises this number until a tipping point is reached where additional users cause the request rate to drop and response time to rise. This tipping point represents the system's maximum load capacity.

2. Influencing Factors

Key factors affecting load capacity include:

Bandwidth

Hardware configuration

System configuration

Application server configuration

Program logic

Bandwidth and hardware are decisive; they can only be improved by scaling or upgrading. The focus is on maximizing load capacity within given bandwidth and hardware limits.

2.1 Bandwidth

Bandwidth, measured in Mbps, determines how much data can be transmitted per second, similar to the size of a water pipe.

2.2 Hardware Configuration

Critical hardware parameters include:

CPU frequency and core count – affect processing speed and thread scheduling.

Memory size and speed – larger and faster memory improves data handling.

Disk speed – SSDs provide significantly faster I/O than traditional HDDs.

Using SSD storage is a common optimization.

2.3 System Configuration

Linux system settings that impact load capacity:

File descriptor limits (e.g., /proc/sys/fs/file-max and per‑process limits).

Process/thread limits (e.g., ulimit -u, /proc/sys/kernel/threads-max).

TCP kernel parameters (e.g., net.ipv4.tcp_syncookies, net.ipv4.tcp_tw_reuse, net.ipv4.tcp_fin_timeout).

2.3.1 File Descriptor Limits

# Temporary change
 echo 1000000 > /proc/sys/fs/file-max
# Permanent change in /etc/sysctl.conf
fs.file-max = 1000000

Ensure total open descriptors do not exceed the system maximum and per‑process soft limits stay within hard limits.

2.3.2 Process/Thread Limits

Process limit: ulimit -u and /etc/security/limits.conf (noproc).

Thread limit: view with /proc/sys/kernel/threads-max and adjust stack size via ulimit -s.

2.3.3 TCP Kernel Parameters

Adjusting TCP settings can relieve load when hardware upgrades are not possible.

netstat -n | awk '/^tcp/ {++S[$NF]} END {for(a in S) print a, S[a]}'

Typical output shows connection states such as TIME_WAIT, which should be reduced by tuning parameters like net.ipv4.tcp_tw_recycle and net.ipv4.tcp_max_tw_buckets.

2.4 Application Server Configuration

Common server concurrency models:

Multi‑process: one process per request.

Prefork: pre‑spawned processes (process pool).

Worker: one thread per request.

Master/worker (event‑driven): Nginx style, suitable for I/O‑bound workloads.

Event‑driven servers like Nginx can handle millions of concurrent connections on modest hardware.

2.4.1 Nginx/Tengine

Match worker count to CPU cores.

Set appropriate keepalive timeout.

Increase worker_rlimit_nofile for more file descriptors.

Enable HTTP/1.1 keepalive for upstream connections.

2.4.2 Tomcat

Key configuration areas:

JVM parameters (heap size, NewGen, PermSize, thread stack size).

Connector settings (protocol, connectionTimeout, maxThreads, minSpareThreads, acceptCount, maxConnection).

Typical maxThreads is 150; beyond 250 concurrent users, clustering is recommended. Over‑tuning threads can increase CPU overhead and memory usage.

2.4.3 Database

MySQL is widely used but can become a bottleneck. Strategies include vertical/horizontal sharding, caching with Redis, and read/write splitting using master/slave replication.

3. Typical Architecture

A common web application stack looks like:

This article focuses on load analysis; further details on LVS will be added later.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability Linux network optimization system load

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.