Ensuring TCP Connections Stay Alive for 60 Seconds During Unstable Networks
This article explains how TCP/IP heartbeat and retransmission mechanisms work, shows how to configure Windows keep‑alive parameters, demonstrates using libwebsockets’ keep‑alive settings, and provides non‑blocking socket + select code to detect connection failures and enforce a 60‑second reconnection window for video conferencing applications.
1. Problem Overview
In a project where the client’s network is unstable, occasional packet loss and jitter cause the software client to lose its TCP connection to the server, interrupting ongoing video conferences. The customer requires the meeting to remain active for at least 60 seconds after the network recovers, otherwise the project cannot pass acceptance.
The client insists on this feature; without it the project will not be paid.
The issue mainly concerns the TCP control‑signal connection; the UDP media stream is less problematic. The focus is on TCP disconnection and reconnection.
Network instability may be detected either by the OS TCP/IP stack (its own heartbeat or packet‑loss retransmission) or by an application‑level heartbeat.
By extending the application‑level timeout or leveraging the OS stack’s mechanisms, the client can automatically reconnect and keep meeting resources alive.
2. TCP/IP Stack Heartbeat Mechanism
2.1 TCP ACK Mechanism
TCP establishes a connection via a three‑way handshake. Every data segment is acknowledged (ACK); if an ACK is not received, the packet‑loss retransmission mechanism is triggered.
Both connection establishment and data transfer use ACK packets, and the TCP/IP stack’s keep‑alive packets are no exception.
2.2 TCP Keep‑Alive Overview
Windows disables keep‑alive by default for a socket. When enabled, the stack sends a keep‑alive probe at a configurable interval (default 2 hours). If the network is normal, the server replies with an ACK and the interval timer resets. If the network is abnormal, the client resends the probe after a 1‑second timeout; after 10 unanswered probes the stack aborts the connection.
keepalivetime (default 2 h) – interval between keep‑alive probes; keepaliveinterval (default 1 s) – timeout waiting for ACK; probe – number of probes (fixed at 10 on Windows).
Thus the default configuration may take a long time to detect failures unless a probe occurs during data exchange.
2.3 Modifying Default Keep‑Alive Parameters
Keep‑alive is enabled per‑socket, not globally. The typical sequence on Windows is:
SOCKET socket;
// ...
int optval = 1;
setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (const char *)&optval, sizeof(optval));
tcp_keepalive alive;
alive.onoff = TRUE;
alive.keepalivetime = 10*1000; // 10 seconds
alive.keepaliveinterval = 2*1000; // 2 seconds
WSAIoctl(socket, SIO_KEEPALIVE_VALS, &alive, sizeof(alive), NULL, 0, &dwBytesRet, NULL, NULL);The code first enables the SO_KEEPALIVE option, then uses WSAIoctl with SIO_KEEPALIVE_VALS to set keepalivetime, keepaliveinterval, and the probe count (fixed at 10 on Windows).
When the keep‑alive limit is reached, the stack returns WSAENETRESET for pending calls and WSAENOTCONN for subsequent calls.
3. Heartbeat in libwebsockets
The libwebsockets library uses the same TCP keep‑alive mechanism. When a websocket connection is idle for a long time, intermediate network devices may close it.
To prevent this, the library’s context creation structure ( lws_context_creation_info) provides three keep‑alive fields:
int ka_time; // keep‑alive timeout (seconds)
int ka_probes; // number of probes before giving up
int ka_interval; // interval between probes (seconds)Example of creating a context with keep‑alive enabled:
static lws_context* CreateContext()
{
lws_set_log_level(0xFF, NULL);
lws_context_creation_info info;
memset(&info, 0, sizeof(info));
info.port = CONTEXT_PORT_NO_LISTEN;
info.protocols = protocols;
info.ka_time = LWS_TCP_KEEPALIVE_TIME; // e.g., 10 s
info.ka_interval = LWS_TCP_KEEPALIVE_INTERVAL; // e.g., 2 s
info.ka_probes = LWS_TCP_KEEPALIVE_PROBES; // e.g., 10
info.options = LWS_SERVER_OPTION_DISABLE_IPV6;
return lws_create_context(&info);
}Internally, libwebsockets calls setsockopt to enable SO_KEEPALIVE and then configures the tcp_keepalive structure via WSAIoctl, similar to the raw socket example above.
4. TCP Retransmission Mechanism
If a packet is sent but its ACK is not received due to network failure, the TCP stack retransmits the packet with exponential back‑off. After a platform‑specific maximum number of retransmissions (Windows 5, Linux 15), the stack declares the network faulty and closes the connection.
Thus, when data is in flight, the stack can detect a fault within seconds and terminate the connection.
5. Non‑Blocking Socket + select for Connect Timeout
5.1 MSDN Description of connect and select
For a blocking socket, connect returns 0 on success or SOCKET_ERROR on failure (after a long OS timeout, e.g., 75 s on Windows). For a non‑blocking socket, connect returns SOCKET_ERROR with WSAEWOULDBLOCK, meaning the operation is in progress.
In this case, select can be used to wait until the socket becomes writable (connection succeeded) or an error occurs.
5.2 Implementing Connect Timeout with select
Typical steps:
Create a TCP socket.
Set it to non‑blocking mode ( ioctlsocket(..., FIONBIO, ...)).
Call connect; it returns immediately.
Prepare a write‑set containing the socket.
Call select with a timeout (e.g., 1 s).
If select returns ≤ 0, close the socket – the connection timed out or failed.
Otherwise, restore blocking mode and proceed.
bool ConnectDevice(char* pszIP, int nPort)
{
SOCKET connSock = socket(AF_INET, SOCK_STREAM, 0);
if (connSock == INVALID_SOCKET) return false;
SOCKADDR_IN devAddr = {0};
devAddr.sin_family = AF_INET;
devAddr.sin_port = htons(nPort);
devAddr.sin_addr.s_addr = inet_addr(pszIP);
unsigned long nonBlock = 1;
ioctlsocket(connSock, FIONBIO, &nonBlock);
connect(connSock, (sockaddr*)&devAddr, sizeof(devAddr));
fd_set writefds;
FD_ZERO(&writefds);
FD_SET(connSock, &writefds);
timeval tv = {1, 0}; // 1‑second timeout
if (select(0, NULL, &writefds, NULL, &tv) <= 0)
{
closesocket(connSock);
return false;
}
nonBlock = 0;
ioctlsocket(connSock, FIONBIO, &nonBlock);
closesocket(connSock);
return true;
}This pattern avoids the long default blocking timeout and gives the application precise control over how long it will wait for a TCP connection to succeed.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
