Databases 35 min read

Why MySQL Connections Spike: When Traffic Isn’t the Real Culprit

This article walks through a systematic, step‑by‑step troubleshooting guide for MySQL "Too many connections" errors, showing how to verify the symptom, inspect server variables, analyze connection status, identify common root causes such as connection‑pool misconfiguration, leaked connections, and long‑running queries, and apply safe fixes and preventive measures.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Why MySQL Connections Spike: When Traffic Isn’t the Real Culprit

Problem Overview

When a production MySQL instance throws ERROR 1040 (HY000): Too many connections, the immediate reaction is to increase max_connections. In practice, only about 30% of such incidents are caused by a genuine traffic surge; the majority stem from connection leaks, inappropriate timeout settings, slow‑query accumulation, or mis‑configured connection pools.

Applicable Scenarios

Any application reports Too many connections errors. Threads_connected is high while Threads_running remains low.

Only a subset of instances can connect, while others fail.

Connection count spikes during low‑traffic periods.

Connection count rises after a new deployment.

Core MySQL Variables

max_connections

– maximum allowed concurrent client connections (default 151 for MySQL 5.7/8.0). Threads_connected – current number of established connections (including idle). Threads_running – connections actively executing queries. Max_used_connections – historical peak of concurrent connections. Aborted_connects / Aborted_clients – failed connection attempts. wait_timeout and interactive_timeout – idle‑connection timeout (default 8 hours).

Diagnostic Process

Confirm the symptom : try connecting with mysql -u root -p -h 127.0.0.1 and observe the error.

Check configuration :

SHOW VARIABLES LIKE 'max_connections';
SHOW VARIABLES LIKE 'wait_timeout';
SHOW VARIABLES LIKE 'interactive_timeout';

Inspect current status :

SHOW STATUS LIKE 'Threads_connected';
SHOW STATUS LIKE 'Threads_running';
SHOW STATUS LIKE 'Max_used_connections';
SHOW STATUS LIKE 'Aborted_connects';
SHOW STATUS LIKE 'Aborted_clients';

Analyze the process list : SHOW FULL PROCESSLIST; Aggregate by user, host, and database to spot the biggest consumers:

SELECT user, COUNT(*) AS conn_count FROM information_schema.processlist GROUP BY user ORDER BY conn_count DESC;
SELECT SUBSTRING_INDEX(host, ':', 1) AS client_host, COUNT(*) AS conn_count FROM information_schema.processlist GROUP BY client_host ORDER BY conn_count DESC;
SELECT db, COUNT(*) AS conn_count FROM information_schema.processlist GROUP BY db ORDER BY conn_count DESC;

Identify idle connections (state = 'Sleep') and their time values:

SELECT id, user, host, db, time FROM information_schema.processlist WHERE command='Sleep' ORDER BY time DESC LIMIT 30;

Detect long‑running queries (e.g., time > 5 and not 'Sleep'):

SELECT id, user, host, db, time, info FROM information_schema.processlist WHERE command!='Sleep' AND time>5 ORDER BY time DESC;

Check InnoDB lock waits for possible deadlocks:

SELECT * FROM sys.innodb_lock_waits;

Common Root Causes

Connection‑pool misconfiguration : overly large maxActive / maximumPoolSize or minIdle creates thousands of idle connections that stay open for hours.

Connection leaks : code paths that acquire a connection but never close it (missing finally or try‑with‑resources).

Excessive wait_timeout : default 8 hours keeps idle connections alive, inflating the count.

Slow queries or lock contention : many queries run for seconds or minutes, holding connections busy.

Improper max_user_connections : per‑user limits can cause unexpected failures when a single account is shared.

Mitigation Steps

Immediate relief : connect as root (has an extra reserved connection) and temporarily raise max_connections: SET GLOBAL max_connections = 2000; Kill idle connections (preferably the longest‑idle ones):

# Using pt‑kill (dry‑run first)
pt‑kill --host=127.0.0.1 --user=root --ask-pass \
  --idle-time=300 --match-command='Sleep' --print
# After verification, add --kill

Adjust pool parameters to realistic values (e.g., maximumPoolSize = 20, minimumIdle = 5 for HikariCP; similar values for Druid).

Enable leak detection :

# HikariCP
spring.datasource.hikari.leak-detection-threshold=60000
# Druid
spring.datasource.druid.remove-abandoned=true
spring.datasource.druid.remove-abandoned-timeout=180
spring.datasource.druid.log-abandoned=true

Reduce idle timeout to 5–10 minutes:

SET GLOBAL wait_timeout = 600;
SET GLOBAL interactive_timeout = 600;

Investigate slow queries with EXPLAIN, the slow‑query log, or sys.innodb_lock_waits, and optimise indexes or rewrite queries.

Review OS limits (file descriptors) and kernel TCP parameters if the connection count approaches the OS limit.

# Check limits
cat /proc/$(pidof mysqld)/limits | grep 'open files'
# sysctl settings (recommended values)
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30

Verification Checklist

Run SHOW STATUS LIKE 'Threads_connected'; – the value should fall back to a comfortable 30‑60% of max_connections.

Confirm no new "Too many connections" messages appear in MySQL error logs or application logs.

Observe monitoring dashboards (e.g., Prometheus mysql_global_status_threads_connected) for stable usage.

Perform a short load test (e.g., mysqlslap) to ensure the pool can handle expected concurrency without exhausting connections.

Validate that connection‑leak detectors no longer report stack traces.

Long‑Term Practices

Standardise connection‑pool configuration across services (keep maxPoolSize between 10‑30, minIdle equal to maxPoolSize only when necessary).

Enable health‑check queries ( validationQuery or test‑while‑idle) so stale connections are recycled automatically.

Set up alerts when connection usage exceeds 80% of the limit.

Consider a proxy layer (e.g., ProxySQL) to multiplex client connections and reduce the number of backend MySQL sockets.

Document and review any configuration changes during deployments to avoid accidental pool size increases.

Automation Scripts (Optional)

Connection‑usage monitor (runs every 30 s) :

#!/bin/bash
set -euo pipefail
MYSQL_USER="monitor"
MYSQL_HOST="127.0.0.1"
MYSQL_PORT="3306"
THRESHOLD=80   # percent
LOG_FILE="/var/log/mysql_conn_monitor.log"
if [[ -z "${MYSQL_PASSWORD:-}" ]]; then
  echo "Error: MYSQL_PASSWORD not set" >&2
  exit 1
fi
max_conn=$(mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -N -e "SHOW VARIABLES LIKE 'max_connections'" | awk '{print $2}')
cur_conn=$(mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -N -e "SHOW STATUS LIKE 'Threads_connected'" | awk '{print $2}')
usage=$((cur_conn*100/max_conn))
echo "$(date '+%Y-%m-%d %H:%M:%S') INFO: Connections $cur_conn/$max_conn ($usage%)" >> "$LOG_FILE"
if (( usage >= THRESHOLD )); then
  echo "$(date '+%Y-%m-%d %H:%M:%S') WARNING: Connection usage $usage% exceeds $THRESHOLD%" >> "$LOG_FILE"
  # Add alerting command here (e.g., webhook)
fi

Idle‑connection cleanup script (dry‑run by default) :

#!/bin/bash
set -euo pipefail
MYSQL_USER="admin"
MYSQL_HOST="127.0.0.1"
MYSQL_PORT="3306"
IDLE_THRESHOLD=600   # seconds
DRY_RUN=${1:-true}
LOG_FILE="/var/log/mysql_kill_idle.log"
if [[ -z "${MYSQL_PASSWORD:-}" ]]; then
  echo "Error: MYSQL_PASSWORD not set" >&2
  exit 1
fi
idle=$(mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -N -e "SELECT id, user, host, db, time FROM information_schema.processlist WHERE command='Sleep' AND time>$IDLE_THRESHOLD AND user NOT IN ('system user','repl_user') ORDER BY time DESC;")
if [[ -z "$idle" ]]; then
  echo "$(date '+%Y-%m-%d %H:%M:%S') No idle connections > $IDLE_THRESHOLD s" >> "$LOG_FILE"
  exit 0
fi
while IFS=$'\t' read -r id user host db t; do
  if [[ "$DRY_RUN" == "true" ]]; then
    echo "$(date '+%Y-%m-%d %H:%M:%S') [DRY‑RUN] Would kill id=$id user=$user host=$host db=$db idle=${t}s" >> "$LOG_FILE"
  else
    mysql -u"$MYSQL_USER" -p"$MYSQL_PASSWORD" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -e "KILL $id;"
    echo "$(date '+%Y-%m-%d %H:%M:%S') Killed id=$id user=$user host=$host db=$db idle=${t}s" >> "$LOG_FILE"
  fi
done <<< "$idle"

Case Studies

Case 1 – New service launch : A newly deployed microservice used the same pool settings as a high‑traffic service ( maxActive=200, minIdle=200). The service created 200 idle connections immediately, pushing Threads_connected close to the limit. Reducing maxActive to 20 and minIdle to 5 resolved the issue.

Case 2 – Connection leak in Java code : A method returned early without closing the acquired connection. Enabling Druid’s remove-abandoned logged the stack trace, pinpointing the missing finally block. Refactoring to try‑with‑resources eliminated the leak.

Case 3 – Slow query caused a backlog : An execution plan change turned an indexed query into a full table scan, causing many connections to stay in Query state for minutes. Adding FORCE INDEX and updating statistics restored the original plan and freed connections.

Final Takeaway

MySQL connection‑exhaustion is rarely a simple capacity problem. The real work is to answer three questions: who is holding the connections? , what are they doing? , and why aren’t they released? By following the systematic diagnostic flow, adjusting pool settings, enabling leak detection, and tuning timeout and kernel parameters, you can both restore service quickly and prevent future outages.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PerformanceDatabaseConnection PoolMySQLTroubleshooting
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.