How to Slash HttpClient Latency from 250ms to 80ms with Pooling and Keep‑Alive
This article walks through a real‑world performance overhaul of Apache HttpClient—using a singleton client, connection pooling, a custom keep‑alive strategy, timeout tuning, and idle‑connection monitoring—to reduce average request time from 250 ms to about 80 ms for a high‑throughput service.
Background
A service receives tens of millions of HTTP requests per day from another department. The original code instantiated a new CloseableHttpClient and a new HttpPost for every request, then manually closed the response and client. This caused an average latency of ~250 ms per call.
Analysis
Repeated HttpClient creation
HttpClientis thread‑safe; creating a new instance per request adds unnecessary object‑creation and garbage‑collection overhead. A single shared client should be used.
Repeated TCP connection establishment
Each request performed a full TCP three‑way handshake and four‑way termination. At high QPS this adds several milliseconds per request. Enabling HTTP keep‑alive allows connection reuse and eliminates most of this cost.
Redundant entity copying
The original code called EntityUtils.toString(response.getEntity()) while leaving the underlying HttpResponse open, causing an extra copy of the payload in memory and requiring explicit connection closure.
Implementation
Define a keep‑alive strategy
ConnectionKeepAliveStrategy myStrategy = new ConnectionKeepAliveStrategy() {
@Override
public long getKeepAliveDuration(HttpResponse response, HttpContext context) {
HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator(HTTP.CONN_KEEP_ALIVE));
while (it.hasNext()) {
HeaderElement he = it.nextElement();
String param = he.getName();
String value = he.getValue();
if (value != null && param.equalsIgnoreCase("timeout")) {
return Long.parseLong(value) * 1000L; // seconds to ms
}
}
return 60L * 1000L; // default 60 seconds
}
};Configure a pooling connection manager
PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
connectionManager.setMaxTotal(500); // total max connections
connectionManager.setDefaultMaxPerRoute(50); // per‑route max, adjust to workloadBuild the shared HttpClient
CloseableHttpClient httpClient = HttpClients.custom()
.setConnectionManager(connectionManager)
.setKeepAliveStrategy(myStrategy)
.setDefaultRequestConfig(RequestConfig.custom()
.setStaleConnectionCheckEnabled(true) // deprecated, see note below
.build())
.build();Note: setStaleConnectionCheckEnabled is deprecated. A better approach is to run a background thread that periodically calls closeExpiredConnections() and closeIdleConnections() .
Idle‑connection monitor thread
public class IdleConnectionMonitorThread extends Thread {
private final HttpClientConnectionManager connMgr;
private volatile boolean shutdown;
public IdleConnectionMonitorThread(HttpClientConnectionManager connMgr) {
this.connMgr = connMgr;
}
@Override
public void run() {
try {
while (!shutdown) {
synchronized (this) {
wait(5000); // 5 s
connMgr.closeExpiredConnections();
connMgr.closeIdleConnections(30, TimeUnit.SECONDS);
}
}
} catch (InterruptedException ex) {
// thread interrupted – exit
}
}
public void shutdown() {
shutdown = true;
synchronized (this) {
notifyAll();
}
}
}Efficient response handling
Do not close the connection manually; let the client manage it. Convert the entity to a string and consume it in one step:
String body = EntityUtils.toString(response.getEntity(), "UTF-8");
EntityUtils.consume(response.getEntity());Alternatively, use a ResponseHandler so the client automatically consumes the entity:
public <T> T execute(HttpHost target, HttpRequest request,
ResponseHandler<T> responseHandler,
HttpContext context) throws IOException {
Args.notNull(responseHandler, "Response handler");
HttpResponse response = execute(target, request, context);
try {
return responseHandler.handleResponse(response);
} finally {
HttpEntity entity = response.getEntity();
if (entity != null) {
EntityUtils.consume(entity);
}
}
}Additional configuration
Timeout settings
HttpParams params = new BasicHttpParams();
int CONNECTION_TIMEOUT = 2 * 1000; // 2 s – time to establish a connection
int SO_TIMEOUT = 2 * 1000; // 2 s – socket read timeout
long CONN_MANAGER_TIMEOUT = 500L; // ms – time to get a connection from the pool
params.setIntParameter(CoreConnectionPNames.CONNECTION_TIMEOUT, CONNECTION_TIMEOUT);
params.setIntParameter(CoreConnectionPNames.SO_TIMEOUT, SO_TIMEOUT);
params.setLongParameter(ClientPNames.CONN_MANAGER_TIMEOUT, CONN_MANAGER_TIMEOUT);
params.setBooleanParameter(CoreConnectionPNames.STALE_CONNECTION_CHECK, true);
httpClient.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false)); // disable retriesNginx keep‑alive (if used as reverse proxy)
Configure the client‑side keepalive_timeout and keepalive_requests, and the upstream keepalive directive so that connections are reused on both sides.
Result
After applying the above changes the average request latency dropped from ~250 ms to ~80 ms, and container thread‑exhaustion alerts disappeared.
Maven dependency
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.6</version>
</dependency>Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
