Databases 18 min read

Mastering Database Performance: From Slow Queries to Deadlock Resolution

This guide presents a systematic, end-to-end diagnostic workflow for high-concurrency database environments, covering slow-SQL detection, execution-plan analysis, lock-conflict monitoring, deadlock investigation, and practical optimization techniques, illustrated with real-world cases, commands, and tool integrations to swiftly resolve performance bottlenecks.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Mastering Database Performance: From Slow Queries to Deadlock Resolution

1 Execution Plan Analysis: Root Cause of Slow SQL

1.1 Detecting and Capturing Slow SQL

Online environments should first establish a slow‑SQL monitoring system. MySQL's default slow‑query threshold is 1 second, but it can be lowered to 200‑500 ms based on business needs.

-- enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 0.2; -- 200ms threshold
SET GLOBAL log_queries_not_using_indexes = 'ON'; -- log queries not using indexes

For captured slow SQL, use pt-query-digest or mysqldumpslow to aggregate and identify high‑frequency patterns. An e‑commerce platform found that over 85% of slow queries targeted 12 core tables, with orders, users, and products accounting for more than 70%.

1.2 Deep EXPLAIN Analysis

Key fields to examine in the execution plan:

type (access type) – ordered from best to worst: const/system, eq_ref/ref, range, index, ALL (full table scan, a performance disaster).

Extra – Using index (covering index), Using where (server‑side filter), Using temporary (temp table for GROUP BY/DISTINCT), Using filesort (cannot use index for sorting).

1.3 Statistics and Plan Drift

The optimizer relies on statistics; stale statistics can cause plan drift. A logistics system deleted rows from 3 M to 800 k without updating stats, leading to a 15× slowdown because the optimizer chose a full scan.

-- MySQL
ANALYZE TABLE orders;
-- PostgreSQL
VACUUM ANALYZE orders;

2 Lock Conflict Diagnosis: Real-time Bottleneck Detection

2.1 Real-time Lock Monitoring

When requests appear stuck without errors, lock contention is a primary suspect. MySQL 5.7+ recommends using performance_schema for real‑time lock monitoring.

Core monitoring tables: data_locks – current lock information. data_lock_waits – lock‑wait relationships. threads – thread‑to‑connection mapping. events_statements_current – currently executing SQL.

Four‑step lock‑wait investigation:

-- Step 1: confirm lock waits
SELECT * FROM performance_schema.data_lock_waits;

-- Step 2: identify who waits for whom
SELECT r.trx_id AS waiting_trx, r.trx_mysql_thread_id AS waiting_thread,
       b.trx_id AS blocking_trx, b.trx_mysql_thread_id AS blocking_thread
FROM performance_schema.data_lock_waits w
JOIN information_schema.innodb_trx r ON w.requesting_engine_transaction_id = r.trx_id
JOIN information_schema.innodb_trx b ON w.blocking_engine_transaction_id = b.trx_id;

-- Step 3: view blocking SQL
SELECT t.THREAD_ID, es.SQL_TEXT
FROM performance_schema.events_statements_current es
JOIN performance_schema.threads t ON es.THREAD_ID = t.THREAD_ID;

-- Step 4: analyze lock objects and scope
SELECT OBJECT_SCHEMA, OBJECT_NAME, INDEX_NAME, LOCK_TYPE, LOCK_MODE
FROM performance_schema.data_locks;

2.2 Common Lock‑Conflict Scenarios and Solutions

Scenario 1: Hot‑row update contention

During a promotion, many UPDATE inventory SET stock = stock - 1 WHERE item_id = 1001 statements compete for the same row, causing a surge in row‑lock waits.

Solutions:

Bucket the stock into multiple rows to disperse lock contention.

Apply optimistic locking with a version column.

Queue the requests to serialize updates.

Scenario 2: Metadata lock (MDL) blocking

Online DDL operations (e.g., ALTER TABLE) executed together with long queries can cause MDL waits, as observed on a social platform during peak traffic.

Solutions:

Use online DDL tools such as pt-online-schema-change.

Set lock_wait_timeout to bound wait time.

Run DDL during low‑traffic windows.

2.3 Tencent Cloud DBbrain Lock Diagnosis Practice

DBbrain offers automated lock‑conflict detection with visual lock‑wait graphs, intelligent analysis, optimization suggestions, and one‑click termination of blocking sessions.

3 Deadlock Analysis: Ultimate Concurrency Diagnosis

3.1 Detecting and Logging Deadlocks

Enable persistent deadlock logging in MySQL:

# my.cnf configuration
innodb_print_all_deadlocks = 1
log_error = /var/log/mysql/error.log

View the latest deadlock details:

SHOW ENGINE INNODB STATUS\G

Key information includes transaction IDs, involved SQL, held and waiting locks, and timestamps.

3.2 Typical Deadlock Scenarios

Scenario 1: Inconsistent update order

Payment callback updates orders then inventory, while a cancellation updates inventory then orders, creating a circular wait.

-- Transaction A (payment)
BEGIN;
UPDATE orders SET status='paid' WHERE order_id=1001;
UPDATE inventory SET stock=stock-1 WHERE item_id=2001;
-- not committed, holds lock on order_id=1001

-- Transaction B (cancellation)
BEGIN;
UPDATE inventory SET stock=stock+1 WHERE item_id=2001;
UPDATE orders SET status='cancelled' WHERE order_id=1001;
-- waits for A's lock on order_id=1001 while A waits for B's lock on item_id=2001

Solution: enforce a consistent lock order, e.g., always lock (order_id, item_id) in lexical order.

Scenario 2: Gap lock causing “innocent” blocking

In REPEATABLE READ, a range SELECT ... FOR UPDATE creates a gap lock that blocks an INSERT into the same range.

-- Transaction A
SELECT * FROM orders WHERE amount BETWEEN 1000 AND 2000 FOR UPDATE;
-- gap lock on amount 1000‑2000

-- Transaction B
INSERT INTO orders (amount) VALUES (1500);
-- blocked because 1500 falls within A's gap lock range

Solution: use READ COMMITTED isolation when feasible, or narrow the range condition to reduce lock scope.

3.3 Deadlock Investigation Toolchain

Native tools:

MySQL – SHOW ENGINE INNODB STATUS, performance_schema.

SQL Server – Extended Events, sys.dm_tran_locks.

Oracle – V$LOCK, V$SESSION, V$DEADLOCK_MONITOR.

Third‑party tools:

Percona Toolkit – pt-deadlock-logger for continuous deadlock monitoring.

PMM (Percona Monitoring and Management) – visual deadlock dashboards.

Tencent Cloud DBbrain – automatic detection and intelligent analysis.

Graphical analysis tools (e.g., SSMS deadlock XML visualizer, Oracle Enterprise Manager) help visualize complex deadlock cycles involving many transactions.

4 Complete Diagnostic Chain and Tool Integration

4.1 Five‑Step Diagnosis: From Symptom to Root Cause

Step 1 – Quick status check (within 30 seconds)

# Check MySQL process
ps aux | grep mysqld
top -p $(pidof mysqld)

# System load
uptime
df -h

Step 2 – Connection and thread analysis

SHOW FULL PROCESSLIST;
SELECT STATE, COUNT(*) AS count FROM information_schema.PROCESSLIST GROUP BY STATE ORDER BY count DESC;

Focus on threads in Locked, Waiting for table metadata lock, or Sending data states.

Step 3 – Lock wait and deadlock location

-- MySQL 8.0+ lock wait query
SELECT * FROM performance_schema.data_lock_waits;
-- Deadlock log
SHOW ENGINE INNODB STATUS\G

Step 4 – Slow query and execution‑plan analysis

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 1001;
SHOW INDEX FROM orders;

Step 5 – System resource bottleneck check

# I/O performance
iostat -x 1 10
# Memory usage
free -h
cat /proc/$(pidof mysqld)/status | grep -E '(VmSize|VmRSS)'

4.2 Tool Integration and Automation

Monitoring stack: Prometheus + Grafana + mysqld_exporter for QPS, TPS, connections, and slow‑query count.

Log analysis: ELK/EFK to collect slow‑query logs and build an SQL fingerprint database.

Real‑time alerts: multi‑level thresholds (e.g., slow‑query > 50/min, lock wait > 30 s).

Intelligent diagnosis: cloud services such as Tencent DBbrain, Alibaba DAS.

Automation pseudo‑code for handling alerts:

def handle_sql_issue(alert):
    if alert.type == "slow_query":
        explain_result = analyze_explain_plan(alert.sql)
        if explain_result.type == "ALL":
            suggest_index = generate_index_suggestion(alert.sql)
            notify_dba(suggest_index)
        elif explain_result.extra == "Using filesort":
            optimization = suggest_order_by_optimization(alert.sql)
            notify_developer(optimization)
    elif alert.type == "deadlock":
        deadlock_info = parse_deadlock_log(alert.log)
        if is_cross_update_pattern(deadlock_info):
            suggest_lock_order(deadlock_info)
        if alert.severity == "critical":
            kill_transaction(deadlock_info.victim_trx_id)

4.3 Preventive Optimization Strategies

Index design principles

Prefer covering indexes to avoid back‑table lookups.

Apply the left‑most prefix rule; order composite index columns by selectivity.

Avoid over‑indexing; each index adds write overhead.

Maintain indexes regularly with ANALYZE TABLE.

Transaction design best practices

Keep transactions short; avoid RPC or file I/O inside.

Use a unified lock acquisition order across resources.

Prefer row‑level locks; use optimistic locking when feasible.

Set appropriate innodb_lock_wait_timeout and implement retry logic.

Architectural level optimizations

Read‑write splitting; route analytical queries to replicas.

Cache hot data with Redis to reduce database pressure.

Offload non‑real‑time work to asynchronous queues.

Shard tables when row count exceeds tens of millions.

5 Summary

Database performance troubleshooting is a systematic engineering effort that requires a complete closed‑loop from monitoring and alerting to root‑cause analysis and optimization. Execution‑plan analysis answers “why slow”, lock‑conflict diagnosis answers “why blocked”, and deadlock analysis answers “why dead”. Together they form a comprehensive diagnostic chain.

In practice, teams should establish SQL review standards, conduct regular performance stress tests, and build automated monitoring. When incidents occur, follow the principle “restore service first, locate root cause second, optimize third”, leveraging both tooling and deep understanding of database internals.

Although the examples use MySQL, the methodology applies to most relational databases; only specific commands differ, while the core “plan‑lock‑deadlock” workflow remains universal.

DeadlockPerformance TuningMySQLSQL OptimizationSlow QueryDatabase DiagnosticsLock Monitoring
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.