Operations 9 min read

Why Nginx Proxy Stops Accepting New Connections with fastsocket – A Full Diagnosis

The article walks through a real‑world investigation of an Nginx HTTP forward proxy on an Alibaba Cloud server that stops accepting new connections after a while, detailing hypothesis testing, strace analysis, fastsocket feature toggles, accept_mutex behavior, and the final fix by adjusting the NIC's RPS settings.

ITPUB
ITPUB
ITPUB
Why Nginx Proxy Stops Accepting New Connections with fastsocket – A Full Diagnosis

Problem Description

The author ran an Nginx HTTP forward proxy on an Alibaba Cloud ECS instance. After some time the proxy could no longer accept new connections, even though existing connections remained functional.

Initial Hypotheses and Tests

Network issues were ruled out by checking iptables and confirming that restarting Nginx immediately restored connectivity, indicating the problem was not external networking.

The possibility that Nginx itself was at fault was examined. Because the proxy used the fastsocket optimization module, the author disabled fastsocket and restarted Nginx with the same configuration; the issue persisted, suggesting the root cause lay elsewhere.

Fastsocket Feature Investigation

Fastsocket provides a set of performance‑enhancing switches. By disabling all switches and re‑enabling them one by one, the author discovered that enabling enable_listen_spawn reproduced the hang consistently.

Discussion with a fastsocket maintainer suggested the problem might involve Nginx’s accept_mutex, which can be implicitly enabled even when not set in the configuration.

Locating the Nginx Hang

Using strace -p on each worker process, the author observed that most workers were repeatedly in epoll_wait. One worker remained stuck in epoll_wait indefinitely, confirming a hang.

The hang occurred only when enable_listen_spawn was active, meaning fastsocket created per‑CPU local listen sockets.

Understanding the Interaction

When fastsocket creates a local listen socket hash per CPU, a connection must be checked against both the local and the global listen tables. If the packet is routed to a CPU whose local table does not contain the socket, the global table is also missed, causing that worker to receive no new connections.

The author realized that the server’s NIC had Receive Packet Steering (RPS) set to 0000, meaning only CPU 0 could receive packets. Workers on CPU 1 never saw new connections, leading to the observed hang.

Solution

By modifying the NIC’s RPS configuration to distribute packets across CPUs, the worker on CPU 1 began receiving connections, and Nginx operated normally even with accept_mutex enabled and fastsocket’s enable_listen_spawn active.

Additional Tools

The author also mentions a small kernel‑level performance‑diagnostic tool available at [email protected]:gfreewind/unit_perf.git, which provides a macro UP_PID_INFO_LOG for logging specific PIDs without overwhelming the log.

Conclusion

The root cause was not a bug in fastsocket or Nginx but a server‑level configuration: the NIC’s RPS setting prevented certain CPUs from receiving packets, causing a worker to hang in epoll_wait. Adjusting RPS resolved the issue, and the investigation highlighted the importance of systematic hypothesis testing and low‑level tracing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

NGINXRPSaccept_mutexepoll_waitfastsocket
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.