Master PostgreSQL Health Checks with Open‑Source SKILL Scripts
This guide introduces a collection of PostgreSQL SKILL scripts for automated health checks and operational analysis, explains the repository layout, shows how to run individual or full daily checks, lists all available skills, and provides a sample report with actionable recommendations.
Overview
The article presents a set of PostgreSQL SKILL scripts designed for health monitoring and business‑level analysis. The code is hosted at https://github.com/digoal/postgres_skill/ and can be extended for custom checks.
Repository Layout
postgres-daily-check/
├── assets/
│ └── db_config.env # PostgreSQL connection variables (PGHOST, PGPORT, PGUSER, PGDATABASE)
├── scripts/
│ ├── run_postgres_check.sh # Core script that executes each skill's SQL and returns JSON
│ └── postgres_agent.py # Agent that orchestrates skills, parses JSON, and generates a Markdown report
└── SKILL.md # Documentation of each skill, its usage, and output formatRunning Skills
Single skill execution :
./run_postgres_check.sh <skill_name> [parameters]Examples: ./run_postgres_check.sh get_long_running_queries 5 – find queries running longer than 5 minutes. ./run_postgres_check.sh get_idle_in_transaction_sessions 2 – find idle‑in‑transaction sessions longer than 2 minutes.
./run_postgres_check.sh get_cache_hit_rate ./run_postgres_check.sh get_table_bloatFull daily check (recommended) :
cd postgres-daily-check
python3 scripts/postgres_agent.pyThe postgres_agent.py script uses only the Python standard library ( json, subprocess, datetime, os) and requires no external dependencies.
Available Skills
The repository defines 40 skills grouped into six categories:
Core Health & Availability : get_invalid_indexes, get_xid_wraparound_risk, get_blocking_locks, get_deadlock_detection, get_critical_settings
Session & Connection Monitoring : get_long_running_queries, get_idle_in_transaction_sessions, get_long_running_transactions, get_long_running_prepared_transactions, get_connection_usage, get_lock_waiters, get_wait_events
Performance & Activity : get_cache_hit_rate, get_rollback_rate, get_top_sql_by_time, get_table_hotspots, get_bgwriter_stats, get_temp_file_usage, get_io_statistics, get_io_statistics_v2, get_analyze_progress, get_create_index_progress, get_cluster_progress, get_wal_statistics, get_checkpointer_stats, get_slru_stats, get_user_function_stats
Replication & Archiving : get_replication_slots, get_replication_status, get_logical_replication_status, get_wal_archiver_status
Maintenance & Storage : get_autovacuum_status, get_table_bloat, get_index_bloat, get_top_objects_by_size, get_large_unused_indexes, get_stale_statistics, get_database_sizes, get_sequence_exhaustion, get_database_conflict_stats
Freeze & Wraparound Protection : get_freeze_prediction
Sample Report
巡检完成!报告时间:2026-02-04 14:53:46
---
PostgreSQL 巡检报告详细分析
整体状态: ❌ ERROR
---
🔴 严重问题
1. 阻塞锁 (Blocking Locks)
- PID 23716 (truncate test) 持有 AccessExclusiveLock,阻塞所有会话
- 多个 SELECT 正在等待该锁
建议:
SELECT pid, usename, state, query, wait_event_type, wait_event FROM pg_stat_activity WHERE pid IN (23716, …);
SELECT pg_terminate_backend(23716);
2. 长事务 (Idle‑in‑Transaction)
- PID 23716 idle in transaction 30:45
建议:
SELECT pid, usename, state, age(clock_timestamp(), xact_start) AS xact_age, query FROM pg_stat_activity WHERE state='idle in transaction' AND age(clock_timestamp(), xact_start) > interval '5 minutes';
SELECT pg_terminate_backend(23716);
设置 idle_in_transaction_session_timeout = 60000;
3. 长查询 (Long‑Running Queries)
- 5 查询运行超过 5 分钟
建议:
SELECT pid, now() - query_start AS duration, usename, query FROM pg_stat_activity WHERE state='active' ORDER BY duration DESC LIMIT 10;
4. 缓存命中率低
- 当前 51.4% (目标 >99%)
建议: 增加 shared_buffers、effective_cache_size、work_mem 等参数。
5. 背景写入器 (bgwriter) 达到 max pages 限制
- 建议调低 bgwriter_lru_maxpages、提高 bgwriter_lru_multiplier 等。
6. 表膨胀 (Table Bloat)
- pgbench_accounts 膨胀 88.65%(浪费 2.03 GB)
建议: VACUUM (FULL) 或使用 pg_repack。
7. 索引膨胀 (Index Bloat)
- pgbench_tellers_pkey 膨胀 99.94%
建议: REINDEX INDEX 或 REINDEX INDEX CONCURRENTLY。
---
✅ 正常项目: 无效索引、XID 回绕、死锁、连接数、回滚率、WAL 归档均 OK。
📊 存储使用统计: postgres 3.5 GB, test 7 MB 等。
🔧 行动建议:
- 立即: 终止阻塞会话、重建膨胀索引。
- 本周: 执行表 VACUUM FULL、调优 bgwriter 参数。
- 长期: 调整 shared_buffers、effective_cache_size、work_mem、maintenance_work_mem、idle_in_transaction_session_timeout 等。Usage Notes
Ensure the pg_stat_statements extension is installed on the target database.
If the PostgreSQL instance hosts multiple databases, run the agent separately for each one.
The AI agent requires the psql client to be available on the host machine.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
