Why Your IoT App Times Out: Understanding OkHttp Connection Pools and Stale Connections
An Android IoT client repeatedly timed out because OkHttp reused stale TCP connections, leading to EOF and socket reset errors, and the fix was to disable the connection pool after discovering the server's keep‑alive timeout was only a few seconds.
1. Connection Pool and Stale Connections
Reusing TCP connections via a pool can dramatically improve performance: establishing a new TCP+TLS handshake takes dozens to hundreds of milliseconds, while sending a request over an existing connection takes only a few milliseconds. This reduces server CPU load, network traffic, and improves user experience.
OkHttp’s default pool size is 5 connections with a keep‑alive duration of 5 minutes, as shown in the source code:
class ConnectionPool internal constructor(
internal val delegate: RealConnectionPool
) {
constructor(
maxIdleConnections: Int,
keepAliveDuration: Long,
timeUnit: TimeUnit
) : this(RealConnectionPool(
taskRunner = TaskRunner.INSTANCE,
maxIdleConnections = maxIdleConnections,
keepAliveDuration = keepAliveDuration,
timeUnit = timeUnit
))
// Default constructor
constructor() : this(5, 5, TimeUnit.MINUTES)
...
}What is a stale (dirty) connection?
When a client reuses an idle TCP connection that the server has already closed—due to timeout, server‑initiated disconnect, or network glitches—the request fails with errors such as:
java.io.EOFException: unexpected end of stream
java.net.SocketException: Connection reset by peerThe connection pool cannot know that the server closed the socket, so it mistakenly hands out a “stale” connection.
2. Problem Encountered and Solution
In an IoT scenario, a low‑power camera exposed a web service. The Android client frequently experienced request timeouts, while the hardware logs showed no incoming request.
Initial guesses focused on the hardware’s request queue and timeout settings, but widening the timeout did not eliminate the issue.
The investigation shifted to stale connections. Enabling OkHttp’s automatic retry:
.retryOnConnectionFailure(true) // keep automatic retrydid not help, because the underlying problem was a closed socket being reused.
Further discussion with the hardware team revealed that the embedded network library had a very short keep‑alive window (often only 1–5 seconds, as typical for Nginx/Tomcat defaults). Under such conditions, OkHttp’s pool offers no benefit.
Note: Nginx/Tomcat default Keep‑Alive timeout may be only "1~5 seconds".
Therefore, disabling the connection pool solved the timeout problem:
.connectionPool(ConnectionPool(0, 1, TimeUnit.SECONDS)) // do not reuse connections3. Summary
Many puzzling issues stem from not fully understanding low‑level networking behavior. By recognizing the impact of stale connections and adjusting the connection‑pool configuration, the IoT client achieved reliable communication.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
