Databases 34 min read

Mastering HikariCP: 7 Essential Metrics and Real-World Troubleshooting

An in‑depth guide walks through HikariCP’s seven most useful monitoring metrics, explains their types, shows real‑world examples of connection storms and slow SQL, and provides step‑by‑step troubleshooting methods, code insights, and configuration tips to keep your database connection pool healthy.

dbaplus Community
dbaplus Community
dbaplus Community
Mastering HikariCP: 7 Essential Metrics and Real-World Troubleshooting

Background

HikariCP is a high‑performance JDBC connection pool widely used in Java applications. This article, authored by senior architect Zhu Zhengke, introduces the design ideas, functional usage, implementation principles, engineering practice, and extensibility of HikariCP, focusing on monitoring and troubleshooting.

Seven Common Monitoring Metrics

Based on Druid’s comparison of connection pools, the author adds HikariCP‑specific metrics. The key metrics include:

hikaricp_pending_threads – Gauge; number of threads waiting for a connection.

hikaricp_connection_acquired_nanos – Summary; time spent acquiring a connection.

hikaricp_idle_connections – Gauge; current idle connections.

hikaricp_active_connections – Gauge; connections currently in use.

hikaricp_connection_usage_millis – Summary; interval between reuse of a connection.

hikaricp_connection_timeout_total – Counter; number of connection‑acquire timeouts per minute.

hikaricp_connection_creation_millis – Summary; time to create a new connection.

Each metric is illustrated with screenshots:

Druid connection pool comparison
Druid connection pool comparison
HikariCP Metrics focus
HikariCP Metrics focus
Connection wait time and active connections
Connection wait time and active connections
Idle connections and creation time
Idle connections and creation time

Metrics Types

Metrics in HikariCP are collected as Gauge, Counter, Meter, Histogram, and Timer. When exported to Prometheus, they appear as Counter, Gauge, Histogram, and Summary. The article explains the meaning of each type and how they map to monitoring tools.

Practical Cases

Two typical failure scenarios are examined:

Connection storm – sudden surge of connections during service startup, leading to resource exhaustion.

Slow SQL – long‑running queries that fill the pool and trigger alerts.

For each case the author shows real monitoring graphs, describes the impact, and suggests mitigation strategies such as adjusting pool size, optimizing queries, or using proxy layers.

Troubleshooting Workflow

The article presents a systematic approach: describe the problem, collect environment data, analyze logs, inspect HikariCP source, and verify configuration parameters. Key code excerpts are shown, for example the connection‑acquire method:

public Connection getConnection(final long hardTimeout) throws SQLException {
    suspendResumeLock.acquire();
    final long startTime = currentTime();
    try {
        long timeout = hardTimeout;
        do {
            PoolEntry poolEntry = connectionBag.borrow(timeout, MILLISECONDS);
            if (poolEntry == null) {
                break; // timeout… break, throw exception
            }
            final long now = currentTime();
            if (poolEntry.isMarkedEvicted() || (elapsedMillis(poolEntry.lastAccessed, now) > ALIVE_BYPASS_WINDOW_MS && !isConnectionAlive(poolEntry.connection))) {
                closeConnection(poolEntry, poolEntry.isMarkedEvicted() ? EVICTED_CONNECTION_MESSAGE : DEAD_CONNECTION_MESSAGE);
                timeout = hardTimeout - elapsedMillis(startTime);
            } else {
                metricsTracker.recordBorrowStats(poolEntry, startTime);
                return poolEntry.createProxyConnection(leakTaskFactory.schedule(poolEntry), now);
            }
        } while (timeout > 0L);
        metricsTracker.recordBorrowTimeoutStats(startTime);
        throw createTimeoutException(startTime);
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        throw new SQLException(poolName + " - Interrupted during connection acquisition", e);
    } finally {
        suspendResumeLock.release();
    }
}

Another snippet shows the isValid implementation used during health checks:

public boolean isValid(int timeout) throws SQLException {
    synchronized (getConnectionMutex()) {
        if (isClosed()) {
            return false;
        }
        try {
            pingInternal(false, timeout * 1000);
        } catch (Throwable t) {
            try { abortInternal(); } catch (Throwable ignore) {}
            return false;
        }
        return true;
    }
}

These snippets help differentiate between validation failures, idle‑timeout evictions, max‑lifetime evictions, manual evictions, and fatal SQLExceptions.

Configuration Recommendations

Typical production settings are suggested:

maximumPoolSize: 20
minimumIdle: 10
connectionTimeout: 30000
idleTimeout: 600000
maxLifetime: 1800000

The author warns against enabling autoReconnect in MySQL, explains the role of connectionTestQuery, and discusses dynamic vs. static pool sizing. Dynamic sizing can cause unnecessary connection churn under low load, while a well‑tuned static size balances memory usage and throughput.

Root‑Cause Analysis of a Real Incident

A concrete incident from April 2018 is dissected: intermittent “Failed to validate connection” errors appeared every 20 minutes. By adding debug logs to HikariPool#getConnection and softEvictConnection, the author discovered that the first connection was evicted by the house‑keeping timer, causing a cascade of warnings. Debug output examples:

Debug log example
Debug log example

Further netstat checks revealed that a legacy proxy caused premature TCP CLOSE_WAIT states:

netstat output
netstat output

After reconfiguring the five affected services to connect directly to the database (removing the obsolete proxy), the errors disappeared and the system stabilized.

Conclusion

The guide consolidates HikariCP monitoring metrics, practical case studies, and a step‑by‑step debugging methodology. It serves as a reference for engineers who need to keep their connection pools performant and reliable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaConnection PooltroubleshootingHikariCP
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.