Fundamentals 22 min read

Ensuring 60‑Second TCP Keep‑Alive During Unstable Networks

Facing frequent packet loss and network jitter that disrupts video conferences, this article examines TCP/IP stack heartbeat and retransmission mechanisms, demonstrates how to configure Windows keepalive parameters, and provides code examples for custom socket options and non‑blocking connect timeouts to maintain session continuity within 60 seconds.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Ensuring 60‑Second TCP Keep‑Alive During Unstable Networks

Problem Overview

In a recent project the client’s network environment was unstable, causing packet loss and jitter that repeatedly broke the TCP connection between the software client and server, interrupting ongoing video conferences. The client required the meeting to remain connected if the network recovered within 60 seconds.

The client insisted on this feature; without it the project could not pass acceptance and payment would be withheld.

The development team identified two types of connections: TCP for signaling and UDP for media streams. The UDP side was less problematic, so the focus was on TCP disconnection and reconnection.

Network instability may be detected either by the system TCP/IP stack’s own heartbeat or by the application‑level heartbeat. The stack can detect issues via its own keep‑alive mechanism or via the TCP retransmission mechanism.

To address application‑level detection, the timeout can be extended. The article mainly discusses TCP/IP stack heartbeat, packet‑loss retransmission, and connection timeout mechanisms. Upon detecting a network fault, the lower layer can automatically reconnect, preserving meeting resources so that the session can continue after the network is restored.

TCP/IP Stack Heartbeat Mechanism

2.1 TCP ACK Mechanism

TCP establishes a connection via a three‑way handshake (illustrated below).

TCP is reliable because data is sent only after a connection is established and each received segment is acknowledged (ACK). If an ACK is not received, the packet‑loss retransmission mechanism is triggered.

2.2 TCP/IP Stack Heartbeat Description

The TCP/IP stack provides a default keep‑alive mechanism bound to a socket, which is disabled by default and must be explicitly enabled.

On Windows the default keep‑alive interval is two hours. After sending a keep‑alive packet the system behaves as follows:

1) Normal network: the server replies with an ACK, resetting the two‑hour timer. Any data exchange also counts as a keep‑alive, restarting the timer. 2) Network fault: if the server does not receive the keep‑alive, the client retries after one second. After ten unanswered probes (the fixed limit on Windows) the stack considers the connection broken and closes it.

The default parameters can be too slow for rapid fault detection.

2.3 Modifying Default Heartbeat Parameters

Enabling keep‑alive for a specific socket involves calling setsockopt with SO_KEEPALIVE, then using WSAIoctl to set custom timing values.

SOCKET socket;<br/>int optval = 1;<br/>int nRet = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, (const char *)&optval, sizeof(optval));<br/>if (nRet != 0) return;<br/><br/>tcp_keepalive alive;<br/>alive.onoff = TRUE;<br/>alive.keepalivetime = 10*1000;      // 10 seconds<br/>alive.keepaliveinterval = 2*1000; // 2 seconds<br/><br/>DWORD dwBytesRet = 0;<br/>nRet = WSAIoctl(socket, SIO_KEEPALIVE_VALS, &alive, sizeof(alive), NULL, 0, &dwBytesRet, NULL, NULL);<br/>if (nRet != 0) return;

The structure tcp_keepalive contains:

1) keepalivetime : interval between keep‑alive probes (default 2 hours). 2) keepaliveinterval : timeout waiting for an ACK (default 1 second). 3) probe : number of unanswered probes before the connection is dropped (fixed at 10 on Windows).

When the limit is reached, Windows returns WSAENETRESET for ongoing socket calls and WSAENOTCONN for subsequent calls.

Using libwebsockets Keep‑Alive

The libwebsockets library leverages the same TCP keep‑alive mechanism. By setting ka_time, ka_interval, and ka_probes in the lws_context_creation_info structure, a websocket context can maintain idle connections.

static lws_context* CreateContext(){<br/>    lws_set_log_level(0xFF, NULL);<br/>    lws_context_creation_info tCreateinfo = {0};<br/>    tCreateinfo.port = CONTEXT_PORT_NO_LISTEN;<br/>    tCreateinfo.protocols = protocols;<br/>    tCreateinfo.ka_time = LWS_TCP_KEEPALIVE_TIME;<br/>    tCreateinfo.ka_interval = LWS_TCP_KEEPALIVE_INTERVAL;<br/>    tCreateinfo.ka_probes = LWS_TCP_KEEPALIVE_PROBES;<br/>    tCreateinfo.options = LWS_SERVER_OPTION_DISABLE_IPV6;<br/>    return lws_create_context(&tCreateinfo);<br/>}

Internally libwebsockets enables keep‑alive on the socket and configures the timing values via setsockopt and WSAIoctl.

TCP/IP Packet‑Loss Retransmission

If a TCP data packet is sent but its ACK is not received due to a network fault, the client triggers retransmission. The interval doubles with each attempt, and after a system‑defined maximum (5 on Windows, 15 on Linux) the stack aborts the connection.

Thus, during a fault with ongoing data exchange, the stack can detect the problem within seconds and close the connection.

Non‑Blocking Socket and Select for Connect Timeout

Using a non‑blocking socket, connect returns immediately. If the connection is still in progress, WSAGetLastError yields WSAEWOULDBLOCK. The select function can then be used to wait for writability within a timeout.

bool ConnectDevice(char* pszIP, int nPort){<br/>    SOCKET connSock = socket(AF_INET, SOCK_STREAM, 0);<br/>    if (connSock == INVALID_SOCKET) return false;<br/>    SOCKADDR_IN devAddr = {0};<br/>    devAddr.sin_family = AF_INET;<br/>    devAddr.sin_port = htons(nPort);<br/>    devAddr.sin_addr.s_addr = inet_addr(pszIP);<br/>    unsigned long ulnoblock = 1;<br/>    ioctlsocket(connSock, FIONBIO, &ulnoblock);<br/>    connect(connSock, (sockaddr*)&devAddr, sizeof(devAddr));<br/>    fd_set writefds; FD_ZERO(&writefds); FD_SET(connSock, &writefds);<br/>    timeval tv; tv.tv_sec = 1; tv.tv_usec = 0;<br/>    if (select(0, NULL, &writefds, NULL, &tv) <= 0){<br/>        closesocket(connSock);<br/>        return false; // timeout<br/>    }<br/>    ulnoblock = 0;<br/>    ioctlsocket(connSock, FIONBIO, &ulnoblock);<br/>    closesocket(connSock);<br/>    return true;<br/>}

This approach avoids the default 75‑second block on a failed connection and allows precise timeout control.

Source: https://zhuanlan.zhihu.com/p/618044850 (© original author)

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TCPWindowssocket programmingnetwork jitterKeepaliveretransmissionnon-blocking connect
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.