Databases 11 min read

Master PostgreSQL Health Checks with Open‑Source SKILL Scripts

This guide introduces a collection of PostgreSQL SKILL scripts for automated health checks and operational analysis, explains the repository layout, shows how to run individual or full daily checks, lists all available skills, and provides a sample report with actionable recommendations.

ITPUB
ITPUB
ITPUB
Master PostgreSQL Health Checks with Open‑Source SKILL Scripts

Overview

The article presents a set of PostgreSQL SKILL scripts designed for health monitoring and business‑level analysis. The code is hosted at https://github.com/digoal/postgres_skill/ and can be extended for custom checks.

Repository Layout

postgres-daily-check/
├── assets/
│   └── db_config.env          # PostgreSQL connection variables (PGHOST, PGPORT, PGUSER, PGDATABASE)
├── scripts/
│   ├── run_postgres_check.sh  # Core script that executes each skill's SQL and returns JSON
│   └── postgres_agent.py       # Agent that orchestrates skills, parses JSON, and generates a Markdown report
└── SKILL.md                   # Documentation of each skill, its usage, and output format

Running Skills

Single skill execution :

./run_postgres_check.sh <skill_name> [parameters]

Examples: ./run_postgres_check.sh get_long_running_queries 5 – find queries running longer than 5 minutes. ./run_postgres_check.sh get_idle_in_transaction_sessions 2 – find idle‑in‑transaction sessions longer than 2 minutes.

./run_postgres_check.sh get_cache_hit_rate
./run_postgres_check.sh get_table_bloat

Full daily check (recommended) :

cd postgres-daily-check
python3 scripts/postgres_agent.py

The postgres_agent.py script uses only the Python standard library ( json, subprocess, datetime, os) and requires no external dependencies.

Available Skills

The repository defines 40 skills grouped into six categories:

Core Health & Availability : get_invalid_indexes, get_xid_wraparound_risk, get_blocking_locks, get_deadlock_detection, get_critical_settings

Session & Connection Monitoring : get_long_running_queries, get_idle_in_transaction_sessions, get_long_running_transactions, get_long_running_prepared_transactions, get_connection_usage, get_lock_waiters, get_wait_events

Performance & Activity : get_cache_hit_rate, get_rollback_rate, get_top_sql_by_time, get_table_hotspots, get_bgwriter_stats, get_temp_file_usage, get_io_statistics, get_io_statistics_v2, get_analyze_progress, get_create_index_progress, get_cluster_progress, get_wal_statistics, get_checkpointer_stats, get_slru_stats, get_user_function_stats

Replication & Archiving : get_replication_slots, get_replication_status, get_logical_replication_status, get_wal_archiver_status

Maintenance & Storage : get_autovacuum_status, get_table_bloat, get_index_bloat, get_top_objects_by_size, get_large_unused_indexes, get_stale_statistics, get_database_sizes, get_sequence_exhaustion, get_database_conflict_stats

Freeze & Wraparound Protection : get_freeze_prediction

Sample Report

巡检完成!报告时间:2026-02-04 14:53:46
---
PostgreSQL 巡检报告详细分析
整体状态: ❌ ERROR
---
🔴 严重问题
1. 阻塞锁 (Blocking Locks)
   - PID 23716 (truncate test) 持有 AccessExclusiveLock,阻塞所有会话
   - 多个 SELECT 正在等待该锁
   建议:
   SELECT pid, usename, state, query, wait_event_type, wait_event FROM pg_stat_activity WHERE pid IN (23716, …);
   SELECT pg_terminate_backend(23716);
2. 长事务 (Idle‑in‑Transaction)
   - PID 23716 idle in transaction 30:45
   建议:
   SELECT pid, usename, state, age(clock_timestamp(), xact_start) AS xact_age, query FROM pg_stat_activity WHERE state='idle in transaction' AND age(clock_timestamp(), xact_start) > interval '5 minutes';
   SELECT pg_terminate_backend(23716);
   设置 idle_in_transaction_session_timeout = 60000;
3. 长查询 (Long‑Running Queries)
   - 5 查询运行超过 5 分钟
   建议:
   SELECT pid, now() - query_start AS duration, usename, query FROM pg_stat_activity WHERE state='active' ORDER BY duration DESC LIMIT 10;
4. 缓存命中率低
   - 当前 51.4% (目标 >99%)
   建议: 增加 shared_buffers、effective_cache_size、work_mem 等参数。
5. 背景写入器 (bgwriter) 达到 max pages 限制
   - 建议调低 bgwriter_lru_maxpages、提高 bgwriter_lru_multiplier 等。
6. 表膨胀 (Table Bloat)
   - pgbench_accounts 膨胀 88.65%(浪费 2.03 GB)
   建议: VACUUM (FULL) 或使用 pg_repack。
7. 索引膨胀 (Index Bloat)
   - pgbench_tellers_pkey 膨胀 99.94%
   建议: REINDEX INDEX 或 REINDEX INDEX CONCURRENTLY。
---
✅ 正常项目: 无效索引、XID 回绕、死锁、连接数、回滚率、WAL 归档均 OK。
📊 存储使用统计: postgres 3.5 GB, test 7 MB 等。
🔧 行动建议:
   - 立即: 终止阻塞会话、重建膨胀索引。
   - 本周: 执行表 VACUUM FULL、调优 bgwriter 参数。
   - 长期: 调整 shared_buffers、effective_cache_size、work_mem、maintenance_work_mem、idle_in_transaction_session_timeout 等。

Usage Notes

Ensure the pg_stat_statements extension is installed on the target database.

If the PostgreSQL instance hosts multiple databases, run the agent separately for each one.

The AI agent requires the psql client to be available on the host machine.

SQLPostgreSQLHealth CheckDatabase Monitoring
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.