Why Did Our API Hang? Uncovering Redis Connection Pool Deadlocks in Spring
A Java Spring application repeatedly stalled due to Redis connection pool deadlocks, and the investigation walks through using system tools, JVM thread dumps, and Arthas to pinpoint the issue, then shows how proper pool configuration and safe Redis access prevent the API from freezing.
Problem Overview
The internal sandbox environment showed API unresponsiveness for a week, with all APIs hanging. Initial restarts temporarily fixed the issue, but the frequency increased, prompting a deeper investigation.
Initial Investigation
SSH into the server and run top to check system load; the machine appeared normal. The next step was to examine JVM thread stacks.
Inspecting Threads
Executed top -H -p 12798 to find resource‑intensive threads, then used jstack 12798 | grep 12799 to locate threads in a lock state.
Discovering the Redis Blockage
Debugging revealed that many http-nio threads were in a waiting state, indicating that API requests were blocked while trying to obtain a Redis connection.
Analyzing Redis Connection Code
/**
* Returns a Jedis instance to be used as a Redis connection. The instance can be newly created or retrieved from a * pool.
*/
protected Jedis fetchJedisConnector() {
try {
if (usePool && pool != null) {
return pool.getResource();
}
Jedis jedis = new Jedis(getShardInfo());
jedis.connect();
return jedis;
} catch (Exception ex) {
throw new RedisConnectionFailureException("Cannot get Jedis connection", ex);
}
}The pool.getResource() call caused threads to wait indefinitely because the pool configuration lacked a proper maxWaitMillis setting.
public T getResource() {
try {
return internalPool.borrowObject();
} catch (Exception e) {
throw new JedisConnectionException("Could not get a resource from the pool", e);
}
return internalPool.borrowObject();
}Further inspection of the pool's takeFirst method showed that when borrowMaxWaitMillis < 0, the code loops forever, confirming the missing timeout configuration.
public E takeFirst() throws InterruptedException {
this.lock.lock();
try {
Object x;
while ((x = this.unlinkFirst()) == null) {
this.notEmpty.await();
}
return (E) x;
} finally {
this.lock.unlock();
}
}Fixing the Pool Configuration
Added a proper timeout to the Redis pool:
JedisConnectionFactory jedisConnectionFactory = new JedisConnectionFactory();
JedisPoolConfig config = new JedisPoolConfig();
config.setMaxWaitMillis(2000); // 2 seconds
jedisConnectionFactory.setPoolConfig(config);
jedisConnectionFactory.afterPropertiesSet();After restarting the service, the issue reappeared, and the Tomcat access log showed many 500 errors caused by the following exception:
org.springframework.data.redis.RedisConnectionFailureException: Cannot get Jedis connection; nested exception is redis.clients.jedis.exceptions.JedisConnectionException: Could not get a resource from the pool
at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:140)
...Root Cause Analysis
The code stringRedisTemplate.getConnectionFactory().getConnection() obtained a Redis connection from the pool but never released it, leaving the connection in a non‑idle state and eventually exhausting the pool.
Cursor c = stringRedisTemplate.getConnectionFactory().getConnection().scan(options);
while (c.hasNext()) {
// processing
}Because the connection was not returned, subsequent requests blocked.
Recommended Practices
Instead of directly using the connection, wrap Redis operations in a RedisCallback so that Spring manages the connection lifecycle:
stringRedisTemplate.execute(new RedisCallback<Cursor>() {
@Override
public Cursor doInRedis(RedisConnection connection) throws DataAccessException {
return connection.scan(options);
}
});Or explicitly release the connection after use:
RedisConnectionUtils.releaseConnection(conn, factory);Avoid using the KEYS command in production and configure the Redis pool with reasonable limits to prevent silent deadlocks.
Architect's Tech Stack
Java backend, microservices, distributed systems, containerized programming, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
