Backend Development 12 min read

How to Slash HttpClient Latency from 250ms to 80ms with Pooling and Keep‑Alive

This article walks through a real‑world performance overhaul of Apache HttpClient—using a singleton client, connection pooling, a custom keep‑alive strategy, timeout tuning, and idle‑connection monitoring—to reduce average request time from 250 ms to about 80 ms for a high‑throughput service.

IT Architects Alliance

Jan 2, 2024

How to Slash HttpClient Latency from 250ms to 80ms with Pooling and Keep‑Alive

Background

A service receives tens of millions of HTTP requests per day from another department. The original code instantiated a new CloseableHttpClient and a new HttpPost for every request, then manually closed the response and client. This caused an average latency of ~250 ms per call.

Analysis

Repeated HttpClient creation

HttpClient

is thread‑safe; creating a new instance per request adds unnecessary object‑creation and garbage‑collection overhead. A single shared client should be used.

Repeated TCP connection establishment

Each request performed a full TCP three‑way handshake and four‑way termination. At high QPS this adds several milliseconds per request. Enabling HTTP keep‑alive allows connection reuse and eliminates most of this cost.

Redundant entity copying

The original code called EntityUtils.toString(response.getEntity()) while leaving the underlying HttpResponse open, causing an extra copy of the payload in memory and requiring explicit connection closure.

Implementation

Define a keep‑alive strategy

ConnectionKeepAliveStrategy myStrategy = new ConnectionKeepAliveStrategy() {
    @Override
    public long getKeepAliveDuration(HttpResponse response, HttpContext context) {
        HeaderElementIterator it = new BasicHeaderElementIterator(
                response.headerIterator(HTTP.CONN_KEEP_ALIVE));
        while (it.hasNext()) {
            HeaderElement he = it.nextElement();
            String param = he.getName();
            String value = he.getValue();
            if (value != null && param.equalsIgnoreCase("timeout")) {
                return Long.parseLong(value) * 1000L; // seconds to ms
            }
        }
        return 60L * 1000L; // default 60 seconds
    }
};

Configure a pooling connection manager

PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setMaxTotal(500);               // total max connections
connectionManager.setDefaultMaxPerRoute(50);      // per‑route max, adjust to workload

Build the shared HttpClient

CloseableHttpClient httpClient = HttpClients.custom()
    .setConnectionManager(connectionManager)
    .setKeepAliveStrategy(myStrategy)
    .setDefaultRequestConfig(RequestConfig.custom()
        .setStaleConnectionCheckEnabled(true) // deprecated, see note below
        .build())
    .build();

Note: setStaleConnectionCheckEnabled is deprecated. A better approach is to run a background thread that periodically calls closeExpiredConnections() and closeIdleConnections() .

Idle‑connection monitor thread

public class IdleConnectionMonitorThread extends Thread {
    private final HttpClientConnectionManager connMgr;
    private volatile boolean shutdown;

    public IdleConnectionMonitorThread(HttpClientConnectionManager connMgr) {
        this.connMgr = connMgr;
    }

    @Override
    public void run() {
        try {
            while (!shutdown) {
                synchronized (this) {
                    wait(5000); // 5 s
                    connMgr.closeExpiredConnections();
                    connMgr.closeIdleConnections(30, TimeUnit.SECONDS);
                }
            }
        } catch (InterruptedException ex) {
            // thread interrupted – exit
        }
    }

    public void shutdown() {
        shutdown = true;
        synchronized (this) {
            notifyAll();
        }
    }
}

Efficient response handling

Do not close the connection manually; let the client manage it. Convert the entity to a string and consume it in one step:

String body = EntityUtils.toString(response.getEntity(), "UTF-8");
EntityUtils.consume(response.getEntity());

Alternatively, use a ResponseHandler so the client automatically consumes the entity:

public <T> T execute(HttpHost target, HttpRequest request,
                     ResponseHandler<T> responseHandler,
                     HttpContext context) throws IOException {
    Args.notNull(responseHandler, "Response handler");
    HttpResponse response = execute(target, request, context);
    try {
        return responseHandler.handleResponse(response);
    } finally {
        HttpEntity entity = response.getEntity();
        if (entity != null) {
            EntityUtils.consume(entity);
        }
    }
}

Additional configuration

Timeout settings

HttpParams params = new BasicHttpParams();
int CONNECTION_TIMEOUT = 2 * 1000; // 2 s – time to establish a connection
int SO_TIMEOUT = 2 * 1000;          // 2 s – socket read timeout
long CONN_MANAGER_TIMEOUT = 500L;   // ms – time to get a connection from the pool

params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, CONNECTION_TIMEOUT);
params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, SO_TIMEOUT);
params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, CONN_MANAGER_TIMEOUT);
params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK, true);

httpClient.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false)); // disable retries

Nginx keep‑alive (if used as reverse proxy)

Configure the client‑side keepalive_timeout and keepalive_requests, and the upstream keepalive directive so that connections are reused on both sides.

Result

After applying the above changes the average request latency dropped from ~250 ms to ~80 ms, and container thread‑exhaustion alerts disappeared.

Maven dependency

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.6</version>
</dependency>

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Java performance optimization Keepalive HttpClient ConnectionPooling

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.