Why Does HikariCP Hang After MySQL Failover? A Deep Dive into Connection Pool Blocking

This article recounts a real‑world investigation of a MySQL high‑availability outage where HikariCP connections appeared to dead‑lock, detailing the architecture, observed symptoms, code analysis, thread‑dump findings, and the final fix of adding socket timeout parameters to the JDBC URL.

dbaplus Community
dbaplus Community
dbaplus Community
Why Does HikariCP Hang After MySQL Failover? A Deep Dive into Connection Pool Blocking

Background

Frequent reliability test failures revealed that occasional MySQL high‑availability issues were hard to reproduce and often required tracing a long chain from the application layer down to the hardware.

Architecture

The system uses MySQL as the primary data store within a typical SpringBoot + SpringCloud micro‑service setup. Persistence components include MyBatis, HikariCP, and the MariaDB Java client. MySQL runs in a dual‑master configuration with Keepalived providing a floating VIP for HA.

Problem Symptoms

During a test that repeatedly restarts the master MySQL container while sending low‑load traffic, the service sometimes becomes completely inaccessible, despite the VIP switching correctly.

Initial Analysis

Developers first suspected Keepalived misconfiguration, but no issues were found. Attention then turned to the DB client layer.

Connection‑Pool Investigation

Logs showed

SQLTransientConnectionException: HikariPool‑1 - Connection is not available, request timed out after 30000ms

. The HikariCP configuration was:

spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.idle-timeout=60000
spring.datasource.hikari.max-lifetime=1800000
spring.datasource.hikari.connection-timeout=30000

Because minimum-idle was 10, a shortage of connections was unlikely. The hypothesis of “zombie connections” was examined by reading HikariCP source.

Key HikariCP Code

public Connection getConnection(final long hardTimeout) throws SQLException {
    suspendResumeLock.acquire();
    final long startTime = currentTime();
    try {
        long timeout = hardTimeout;
        do {
            PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
            if (poolEntry == null) { break; }
            final long now = currentTime();
            if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > aliveBypassWindowMs && !isConnectionAlive(poolEntry.connection))) {
                closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
                timeout = hardTimeout - elapsedMillis(startTime);
            } else {
                metricsTracker.recordBorrowStats(poolEntry, startTime);
                return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
            }
        } while (timeout > 0L);
        metricsTracker.recordBorrowTimeoutStats(startTime);
        throw createTimeoutException(startTime);
    } finally {
        suspendResumeLock.release();
    }
}

The isConnectionAlive method validates connections via network timeout and optional test query. If validation fails, the connection is closed and the pool attempts to create a new one.

boolean isConnectionAlive(final Connection connection) {
    try {
        setNetworkTimeout(connection, validationTimeout);
        int validationSeconds = Math.max(1000L, validationTimeout) / 1000;
        if (isUseJdbc4Validation) {
            return connection.isValid(validationSeconds);
        }
        try (Statement stmt = connection.createStatement()) {
            if (isNetworkTimeoutSupported != TRUE) { setQueryTimeout(stmt, validationSeconds); }
            stmt.execute(config.getConnectionTestQuery());
        }
        return true;
    } catch (Exception e) {
        logger.warn("{} - Failed to validate connection {} ({}).", poolName, connection, e.getMessage());
        return false;
    }
}

Log excerpts confirmed validation failures such as

Connection.setNetworkTimeout cannot be called on a closed connection

.

Thread‑Dump Insight

Thread dump showed the HikariCP “connection adder” thread blocked in socketRead0 while the MariaDB driver was reading the initial handshake packet:

"HikariPool-1 connection adder" #121 daemon prio=5 os_prio=0 nid=0xad runnable
   at java.net.SocketInputStream.socketRead0(Native Method)
   ...
   at org.mariadb.jdbc.internal.io.input.ReadAheadBufferedStream.fillBuffer(...)
   at org.mariadb.jdbc.internal.com.read.ReadInitialHandShakePacket.<init>(…)

This indicated that the driver was stuck waiting for the MySQL handshake response after the master container had been stopped, leaving the TCP connection in ESTABLISHED state but unable to complete the protocol.

Impact on the Pool

The AddConnectionExecutor is single‑threaded; when the handshake blocks, no new connections can be created, causing the entire pool to run out of usable connections and the application to hang.

Solution

Two possible mitigations were considered:

Increase the number of threads in AddConnectionExecutor – deemed ineffective because a blocked thread still consumes resources.

Set a socket read timeout to prevent indefinite blocking.

Using the MariaDB driver’s socketTimeout parameter in the JDBC URL solved the issue:

spring.datasource.url=jdbc:mysql://10.0.71.13:33052/appdb?socketTimeout=60000&connectTimeout=30000&serverTimezone=UTC

After adding the timeout, repeated reliability tests showed no more connection hangs.

Conclusion

The root cause was not a HikariCP bug but a missing socket timeout in the MariaDB JDBC driver, which caused the connection‑adder thread to block indefinitely during MySQL failover. Adding socketTimeout (or SO_TIMEOUT) prevents the pool from being stalled, improving overall system reliability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DebuggingMySQLHikariCPhigh-availabilityKeepalivedsocket-timeoutconnection-pool
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.