Fundamentals 10 min read

Why TCP Congestion Control Adds Unexpected RTT Delays and How to Fix Them

The article analyzes how TCP's congestion control and slow‑start mechanisms introduce extra round‑trip times, causing service latency to far exceed network RTT, and explains how TCP connection setup, cwnd limits, and long‑lived connections affect overall response times.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why TCP Congestion Control Adds Unexpected RTT Delays and How to Fix Them

Today I share an article about the impact of TCP congestion control on data latency. After service latency increased, packet capture showed the delay was caused by TCP itself: the client does not send the whole request at once, waits for the server's ACK, then continues, adding extra RTTs due to TCP congestion control.

The problem arose when moving from an internal network with <1 ms RTT to a new environment with 100 ms RTT. Expected service latency was around 102 ms, but the observed 99th‑percentile latency rose to about 300 ms.

Analysis of logs revealed request latencies of 200 ms, 300 ms, even 400 ms, indicating several extra RTTs. Packet capture showed that after the TCP three‑way handshake, the client only sent a portion of the data, waited for an ACK, and then sent the rest, which is caused by the TCP congestion window (cwnd) and slow‑start.

TCP connection establishment itself adds one RTT. The handshake timeline is:

+0       A -> B SYN
+0.5 RTT B -> A SYN+ACK
+1 RTT   A -> B ACK
+1 RTT   A -> B Data

The third packet (Data) can be sent immediately after the ACK, so the handshake effectively adds one RTT. Combined with the data transmission RTT, the observable maximum is two RTTs (200 ms). However, additional RTTs appear because the client sends data in segments limited by cwnd.

In Linux the default initial congestion window is 10 MSS (≈14 KB). When the payload is ≤ 14 KB, the entire request can be sent in two RTTs (one for the handshake, one for data). Sending 14 480 bytes in a 100 ms RTT environment took exactly 200 ms, as shown in the capture:

If the payload exceeds 14 480 bytes by even one byte, an extra RTT is needed, increasing latency to about 300 ms. The capture for 14 481 bytes demonstrates this extra 100 ms:

Slow‑start occurs only during the initial phase; once the first burst is acknowledged, cwnd grows and more data can be sent per RTT. For a typical request/response of ~30 KB each, with bidirectional cwnd, the total latency becomes 4 RTTs (client request + 1 RTT, server response + 1 RTT, TCP handshake + 1 RTT, data transmission + 1 RTT), explaining the observed 300–400 ms.

A simple solution is to use TCP persistent connections, eliminating the handshake overhead for subsequent requests.

Adjusting the initial cwnd is possible via socket options, e.g.:

setsockopt(fd, IPPROTO_TCP, TCP_CWND, &val, sizeof(val))

However, increasing cwnd arbitrarily can be unsafe: if the network is congested, a large initial cwnd may exacerbate congestion and prevent automatic recovery. The appropriate cwnd value depends on the characteristics of the route, not on the application.

Experiments increasing cwnd to 40 (via ip route) showed that the client could send more data per RTT, reducing latency for larger payloads.

In practice, diagnosing such latency issues requires:

Identifying high‑latency requests from logs (monitoring only signals a problem, not its cause).

Analyzing which phase of the request consumes time.

Using packet capture or similar tools to verify the hypothesis, which can be complex due to many concurrent connections.

Large organizations often involve multiple teams, making root‑cause analysis difficult without coordinated effort.

Afterword

Understanding TCP fundamentals makes the root cause clear, but monitoring alone rarely reveals which specific requests are responsible, leading to blame‑shifting between middleware, platform, and network teams.

For further details, see the original article: https://www.cnblogs.com/edisonfish/p/17970734

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

TCPLatencyPacket Capturenetwork performancecongestion controlSlow Startcwnd
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.