Why Your IoT App Times Out: Understanding OkHttp Connection Pools and Stale Connections

An Android IoT client repeatedly timed out because OkHttp reused stale TCP connections, leading to EOF and socket reset errors, and the fix was to disable the connection pool after discovering the server's keep‑alive timeout was only a few seconds.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Why Your IoT App Times Out: Understanding OkHttp Connection Pools and Stale Connections

1. Connection Pool and Stale Connections

Reusing TCP connections via a pool can dramatically improve performance: establishing a new TCP+TLS handshake takes dozens to hundreds of milliseconds, while sending a request over an existing connection takes only a few milliseconds. This reduces server CPU load, network traffic, and improves user experience.

OkHttp’s default pool size is 5 connections with a keep‑alive duration of 5 minutes, as shown in the source code:

class ConnectionPool internal constructor(
    internal val delegate: RealConnectionPool
) {
    constructor(
        maxIdleConnections: Int,
        keepAliveDuration: Long,
        timeUnit: TimeUnit
    ) : this(RealConnectionPool(
        taskRunner = TaskRunner.INSTANCE,
        maxIdleConnections = maxIdleConnections,
        keepAliveDuration = keepAliveDuration,
        timeUnit = timeUnit
    ))

    // Default constructor
    constructor() : this(5, 5, TimeUnit.MINUTES)
    ...
}

What is a stale (dirty) connection?

When a client reuses an idle TCP connection that the server has already closed—due to timeout, server‑initiated disconnect, or network glitches—the request fails with errors such as:

java.io.EOFException: unexpected end of stream
java.net.SocketException: Connection reset by peer

The connection pool cannot know that the server closed the socket, so it mistakenly hands out a “stale” connection.

2. Problem Encountered and Solution

In an IoT scenario, a low‑power camera exposed a web service. The Android client frequently experienced request timeouts, while the hardware logs showed no incoming request.

Initial guesses focused on the hardware’s request queue and timeout settings, but widening the timeout did not eliminate the issue.

The investigation shifted to stale connections. Enabling OkHttp’s automatic retry:

.retryOnConnectionFailure(true) // keep automatic retry

did not help, because the underlying problem was a closed socket being reused.

Further discussion with the hardware team revealed that the embedded network library had a very short keep‑alive window (often only 1–5 seconds, as typical for Nginx/Tomcat defaults). Under such conditions, OkHttp’s pool offers no benefit.

Note: Nginx/Tomcat default Keep‑Alive timeout may be only "1~5 seconds".

Therefore, disabling the connection pool solved the timeout problem:

.connectionPool(ConnectionPool(0, 1, TimeUnit.SECONDS)) // do not reuse connections

3. Summary

Many puzzling issues stem from not fully understanding low‑level networking behavior. By recognizing the impact of stale connections and adjusting the connection‑pool configuration, the IoT client achieved reliable communication.

AndroidConnection PoolIoTOkHttpnetwork timeoutStale Connection
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.