Operations 12 min read

How to Diagnose and Fix High Load with Low CPU Utilization in Nginx

This article walks through Linux process states, fork/exec creation, CPU vs load metrics, network I/O models, and a real‑world Nginx high‑load case, showing how strace, accept_mutex, and SO_REUSEPORT can resolve the issue.

Qu Tech
Qu Tech
Qu Tech
How to Diagnose and Fix High Load with Low CPU Utilization in Nginx

In this SRE case study we explore why a 32‑core Nginx gateway showed high load average while CPU utilization remained low, and we detail the step‑by‑step troubleshooting and remediation techniques.

Process State Codes

PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:
    D    uninterruptible sleep (usually IO)
    R    running or runnable (on run queue)
    S    interruptible sleep (waiting for an event to complete)
    T    stopped by job control signal
    t    stopped by debugger during tracing
    W    paging (not valid since the 2.6.xx kernel)
    X    dead (should never be seen)
    Z    defunct ("zombie") process, terminated but not reaped by its parent

Process Creation Methods

fork : the child copies the parent’s memory (copy‑on‑write) and inherits all open file descriptors, including the listening socket used later in the problem.

exec : replaces the current process image with a new program; the PID stays the same and file descriptors are inherited unless explicitly closed.

CPU Utilization vs Load Average

CPU utilization measures the fraction of time a process spends on the CPU, while load average counts the number of runnable or uninterruptible tasks (including those in D state). High load does not always mean high CPU usage; many tasks may be blocked in I/O.

Network I/O Models

Three common models are:

accept : listens on a single socket, inefficient under heavy traffic.

select/poll : can monitor many sockets but uses active polling.

epoll : event‑driven, scales to thousands of connections and is used by high‑performance servers like Nginx.

Case Study: High Load on Nginx

The test environment showed a load average around 30 on a 32‑core machine, while most Nginx workers were in D state.

Diagnosing with strace

Running strace -C -T -ttt -p <pid> -o strac.log revealed 35,518 accept calls, of which 32,412 returned failure.

The pattern matched the classic “thundering herd” problem: all workers wake on a new connection but only one handles it, causing excessive context switches.

Accept Mutex Solution

Enabling accept_mutex=on ensures only the worker that acquires the lock performs accept, reducing the herd effect.

SO_REUSEPORT Feature

Linux’s SO_REUSEPORT allows multiple sockets to bind to the same port, giving each worker its own listening socket and eliminating lock contention.

Kernel Lock Analysis

static inline void lock_sock(struct sock *sk)
{
    lock_sock_nested(sk, 0);
}

void lock_sock_nested(struct sock *sk, int subclass)
{
    might_sleep();
    spin_lock_bh(&sk->sk_lock.slock);
    if (sk->sk_lock.owned)
        __lock_sock(sk);
    sk->sk_lock.owned = 1;
    spin_unlock(&sk->sk_lock.slock);
    /* The sk_lock has mutex_lock() semantics here: */
    mutex_acquire(&sk->sk_lock.dep_map, subclass, 0, _RET_IP_);
    local_bh_enable();
}

static void __lock_sock(struct sock *sk)
{
    DEFINE_WAIT(wait);
    for (;;) {
        prepare_to_wait_exclusive(&sk->sk_lock.wq, &wait, TASK_UNINTERRUPTIBLE);
        spin_unlock_bh(&sk->sk_lock.slock);
        schedule();
        spin_lock_bh(&sk->sk_lock.slock);
        if (!sock_owned_by_user(sk))
            break;
    }
    finish_wait(&sk->sk_lock.wq, &wait);
}
#define sock_owned_by_user(sk) ((sk)->sk_lock.owned)

The lock‑sock code shows that accept can enter an uninterruptible sleep, which explains the D‑state workers observed during the thundering herd.

Conclusion

Understanding Linux process states, I/O models, and Nginx’s master/worker architecture enables targeted tuning: enable accept_mutex, adopt SO_REUSEPORT, and monitor kernel locks to eliminate high load caused by the thundering herd effect.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Nginxepollcpu-utilizationSO_REUSEPORThigh loadLinux performanceaccept_mutex
Qu Tech
Written by

Qu Tech

Qutoutiao technology sharing

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.