Why Does Redis Return “Server Went Away”? Diagnosing Timeout and TCP Keepalive Issues
This article walks through a real‑world Redis timeout problem, examining client‑side socket settings, server configuration, system metrics, and kernel TCP parameters to pinpoint why connections are dropped and how adjusting timeout and tcp‑keepalive resolves the latency.
Symptoms
Earlier, the Redis client using PHP connect frequently reported "redis server went away" errors.
Investigation
The first suspicion was Redis timeout settings, including timeout, tcp-keepalive, and PHP's default_socket_timeout (set to 300 seconds).
127.0.0.1:6381> CONFIG GET *
17) "timeout"
18) "0"
19) "tcp-keepalive"
20) "0"
vim xxx/php_path/php.ini
default_socket_timeout = 300Note: setting the socket timeout to 0 causes failures.
Testing did not resolve the issue; switching to pconnect also failed.
# vmstat 1 3
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st 0 0 0 6022256 383340 10371320 0 0 0 25 0 0 0 0 100 0 0
# iostat -x -k 1
Linux 2.6.18-308.el5 (yq-bbsrqueue1) 12/24/2015
avg-cpu: %user %nice %system %iowait %steal %idle
0.07 0.00 0.05 0.00 0.00 99.87
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
cciss/c0d0 0.00 2.52 0.00 0.51 0.20 12.12 48.39 0.00 0.47 0.25 0.01
...Network checks showed no issues.
Potential causes considered: AOF and disk I/O latency, key expiration latency, Redis watchdog latency.
From iostat, AOF latency was negligible.
Key expiration can be lazy (checked on access) or active (checked every 100 ms).
Latency generated by expires
Redis evict expired keys in two ways:
One lazy way expires a key when it is requested by a command, but it is found to be already expired.
One active way expires a few keys every 100 milliseconds.Watchdog configuration was empty:
127.0.0.1:6381> config get watchdog
(empty list or set)The problematic configuration items were identified as timeout and tcp-keepalive. System TCP settings were examined:
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 150
net.ipv4.tcp_max_tw_buckets = 20000Solution
Adjusting the kernel parameters resolved the issue. The final settings were: net.ipv4.tcp_fin_timeout = 60 The problem stemmed from a mismatch between application‑level and kernel‑level connection timeouts: the kernel closed the connection while the application still expected it to be alive, causing indirect latency.
Reference: Redis latency documentation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
