7 Ready‑to‑Use Shell Scripts for Automated Server Health Checks
This article presents seven production‑grade Bash scripts that automate server health inspections—including system resource monitoring, disk analysis, network diagnostics, process validation, security baseline checks, MySQL health assessment, and a batch scheduler—plus best‑practice guidance for integrating them into an operations workflow.
Introduction
The author shares a set of seven production‑level Shell scripts that automate server health inspections, enabling proactive detection of issues before they cause service interruptions.
Why automation is needed
Traditional monitoring tools such as Zabbix or Prometheus have blind spots; they may miss configuration omissions, gradual metric drifts, fine‑grained system details, business‑logic failures, or monitoring system outages. Automated scripts complement monitoring by performing systematic checks.
Core benefits of automated inspection
Proactive discovery : Detect problems without relying on alert rules.
Comprehensiveness : Cover system, network, application, and log dimensions.
Customizability : Tailor checks to specific business needs.
Traceability : Generate detailed reports for root‑cause analysis.
Low cost : Pure Shell scripts require no extra agents.
Offline capability : Continue operating even if the monitoring system fails.
Enterprise‑level inspection scenarios
Typical daily operations include scheduled health checks at 8 am, pre‑event comprehensive checks (e.g., before Double‑11 sales), post‑deployment verification, emergency manual diagnostics, and audit/compliance inspections. A real‑world example shows a financial company avoiding a 5 million‑yuan outage by expanding disk space after a batch check.
Core content: 7 production‑grade inspection scripts
Script 1: System health check
This foundational script examines CPU, memory, disk, network, and process usage, logs warnings, and produces a colored, timestamped report.
#!/bin/bash
################################################################################
# Script name: system_health_check.sh
# Description: Comprehensive system health inspection
# Version: v2.0
# Author: DevOps Team
# Last modified: 2025-01-15
################################################################################
# Configuration area
HOSTNAME=$(hostname)
DATE=$(date +"%Y-%m-%d %H:%M:%S")
REPORT_FILE="/var/log/system_check_$(date +%Y%m%d_%H%M%S).log"
# Alert thresholds
CPU_WARNING=80 # CPU usage warning threshold (%)
MEM_WARNING=85 # Memory usage warning threshold (%)
DISK_WARNING=85 # Disk usage warning threshold (%)
LOAD_WARNING=4 # System load warning threshold (adjust per CPU cores)
INODE_WARNING=80 # Inode usage warning threshold (%)
# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# Logging functions
log() {
echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"
}
log_section() {
echo "" | tee -a "$REPORT_FILE"
echo "============================================================" | tee -a "$REPORT_FILE"
echo " $1" | tee -a "$REPORT_FILE"
echo "============================================================" | tee -a "$REPORT_FILE"
}
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
# 1. Basic information collection
check_basic_info() {
log_section "1. System basic information"
log "Hostname: $HOSTNAME"
log "Check time: $DATE"
log "OS version: $(cat /etc/redhat-release 2>/dev/null || cat /etc/issue | head -1)"
log "Kernel version: $(uname -r)"
log "Architecture: $(uname -m)"
log "Uptime: $(uptime | awk -F'up ' '{print $2}' | awk -F',' '{print $1}')"
log "Current user: $(whoami)"
log "Logged‑in users: $(who | wc -l)"
}
# 2. CPU usage check
check_cpu() {
log_section "2. CPU usage check"
CPU_CORES=$(grep -c ^processor /proc/cpuinfo)
log "CPU cores: $CPU_CORES"
CPU_IDLE=$(top -bn2 -d 1 | grep "Cpu(s)" | tail -1 | awk '{print $8}' | cut -d'%' -f1)
CPU_USAGE=$(echo "scale=2; 100 - $CPU_IDLE" | bc)
log "CPU usage: ${CPU_USAGE}%"
if (( $(echo "$CPU_USAGE > $CPU_WARNING" | bc -l) )); then
log_warning "CPU usage exceeds $CPU_WARNING%, current $CPU_USAGE%"
log "Top 5 CPU‑hungry processes:"
ps aux | sort -rn -k3 | head -5 | awk '{printf " PID: %-8s User: %-10s CPU: %-6s CMD: %s
", $2,$1,$3,$11}' | tee -a "$REPORT_FILE"
else
log_ok "CPU usage normal: $CPU_USAGE%"
fi
LOAD_1=$(uptime | awk -F'load average:' '{print $2}' | awk -F',' '{print $1}' | xargs)
log "Load (1 min): $LOAD_1"
LOAD_THRESHOLD=$(echo "$CPU_CORES * 2" | bc)
if (( $(echo "$LOAD_1 > $LOAD_THRESHOLD" | bc -l) )); then
log_warning "Load too high! 1‑min load $LOAD_1 exceeds threshold $LOAD_THRESHOLD"
fi
}
# 3. Memory usage check
check_memory() {
log_section "3. Memory usage check"
MEM_TOTAL=$(free -m | awk 'NR==2{print $2}')
MEM_USED=$(free -m | awk 'NR==2{print $3}')
MEM_AVAILABLE=$(free -m | awk 'NR==2{print $7}')
MEM_USAGE=$(echo "scale=2; $MEM_USED / $MEM_TOTAL * 100" | bc)
log "Total memory: ${MEM_TOTAL}MB"
log "Used memory: ${MEM_USED}MB"
log "Available memory: ${MEM_AVAILABLE}MB"
log "Memory usage: ${MEM_USAGE}%"
if (( $(echo "$MEM_USAGE > $MEM_WARNING" | bc -l) )); then
log_warning "Memory usage exceeds $MEM_WARNING%, current $MEM_USAGE%"
log "Top 5 memory‑hungry processes:"
ps aux | sort -rn -k4 | head -5 | awk '{printf " PID: %-8s User: %-10s MEM: %-6s CMD: %s
", $2,$1,$4,$11}' | tee -a "$REPORT_FILE"
else
log_ok "Memory usage normal: $MEM_USAGE%"
fi
SWAP_TOTAL=$(free -m | awk 'NR==3{print $2}')
SWAP_USED=$(free -m | awk 'NR==3{print $3}')
log "Swap total: ${SWAP_TOTAL}MB"
log "Swap used: ${SWAP_USED}MB"
if [ "$SWAP_TOTAL" -gt 0 ] && [ "$SWAP_USED" -gt 100 ]; then
log_warning "High swap usage: $SWAP_USEDMB, possible memory pressure"
fi
}
# 4. Disk usage check
check_disk() {
log_section "4. Disk usage check"
log "Disk partition usage:"
df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop' | awk '{print " " $0}' | tee -a "$REPORT_FILE"
HAS_DISK_WARNING=0
while read line; do
USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
MOUNT=$(echo "$line" | awk '{print $6}')
if [ "$USAGE" -gt $DISK_WARNING ]; then
log_warning "Partition $MOUNT usage $USAGE% exceeds $DISK_WARNING%"
HAS_DISK_WARNING=1
log " $MOUNT top 5 directories:"
du -sh $MOUNT/* 2>/dev/null | sort -rh | head -5 | awk '{print " " $0}' | tee -a "$REPORT_FILE"
fi
done < <(df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop')
if [ $HAS_DISK_WARNING -eq 0 ]; then
log_ok "All disk partitions within limits"
fi
log "Inode usage:"
df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop' | awk '{print " " $0}' | tee -a "$REPORT_FILE"
while read line; do
INODE_USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
MOUNT=$(echo "$line" | awk '{print $6}')
if [ "$INODE_USAGE" -gt $INODE_WARNING ]; then
log_warning "Partition $MOUNT inode usage $INODE_USAGE% exceeds $INODE_WARNING%"
fi
done < <(df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop')
}
# 5. Network status check
check_network() {
log_section "5. Network status check"
log "Network interfaces:"
ip -br addr | awk '{print " " $0}' | tee -a "$REPORT_FILE"
log "TCP connection states:"
netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c | sort -rn | tee -a "$REPORT_FILE"
TIME_WAIT_COUNT=$(netstat -an | grep TIME_WAIT | wc -l)
log "TIME_WAIT connections: $TIME_WAIT_COUNT"
if [ $TIME_WAIT_COUNT -gt 5000 ]; then
log_warning "Too many TIME_WAIT connections"
log "Optimization suggestions: echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"
else
log_ok "TIME_WAIT connections normal"
fi
log "Listening ports:"
netstat -tuln | grep LISTEN | tee -a "$REPORT_FILE"
}
# 6. Generate summary
generate_summary() {
log_section "6. Inspection summary"
WARNING_COUNT=$(grep -c "\[WARNING\]" "$REPORT_FILE")
ERROR_COUNT=$(grep -c "\[ERROR\]" "$REPORT_FILE")
log "Inspection completed at $(date +"%Y-%m-%d %H:%M:%S")"
log "Warnings: $WARNING_COUNT"
log "Errors: $ERROR_COUNT"
if [ $ERROR_COUNT -gt 0 ]; then
log_error "Found $ERROR_COUNT critical issues, please address immediately!"
elif [ $WARNING_COUNT -gt 0 ]; then
log_warning "Found $WARNING_COUNT warnings, please review"
else
log_ok "System status good, no anomalies"
fi
log "Full report saved to: $REPORT_FILE"
}
# Main function
main() {
echo "=========================================="
echo " Server health inspection script v2.0"
echo "=========================================="
echo ""
if [ "$(id -u)" -ne 0 ]; then
echo "Warning: not running as root, some checks may be limited"
fi
check_basic_info
check_cpu
check_memory
check_disk
check_network
generate_summary
echo ""
echo "=========================================="
echo " Inspection finished!"
echo "=========================================="
}
main "$@"Script 2: Disk space deep check
This script performs a detailed analysis of disk usage, identifies the largest directories, finds large files, predicts when a disk will fill up, and offers cleanup suggestions.
#!/bin/bash
################################################################################
# Script name: disk_space_check.sh
# Description: Deep disk space analysis and reporting
# Version: v1.5
################################################################################
REPORT_FILE="/var/log/disk_check_$(date +%Y%m%d_%H%M%S).log"
DISK_WARNING=80
INODE_WARNING=80
LARGE_FILE_SIZE=1G # Large file threshold
# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
log "============================================================"
log "1. Disk usage check"
log "============================================================"
# Check each partition
while read line; do
USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
MOUNT=$(echo "$line" | awk '{print $6}')
AVAIL=$(echo "$line" | awk '{print $4}')
log "Partition: $MOUNT"
log " Usage: ${USAGE}%"
log " Available: $AVAIL"
if [ "$USAGE" -gt $DISK_WARNING ]; then
log_error "Disk usage exceeds $DISK_WARNING% on $MOUNT"
log " Top 10 directories in $MOUNT:"
du -sh $MOUNT/* 2>/dev/null | sort -rh | head -10 | awk '{print " " $0}' | tee -a "$REPORT_FILE"
log " Large files (> $LARGE_FILE_SIZE) in $MOUNT:"
find $MOUNT -type f -size +${LARGE_FILE_SIZE} -exec ls -lh {} \; 2>/dev/null | awk '{print " " $9 " (" $5 ")"}' | head -10 | tee -a "$REPORT_FILE"
else
log_ok "Disk usage normal on $MOUNT"
fi
done < <(df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop')
# Inode usage check
log ""
log "Inode usage check:"
while read line; do
INODE_USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
MOUNT=$(echo "$line" | awk '{print $6}')
log "Partition: $MOUNT"
log " Inode usage: ${INODE_USAGE}%"
if [ "$INODE_USAGE" -gt $INODE_WARNING ]; then
log_error "Inode usage exceeds $INODE_WARNING% on $MOUNT"
fi
done < <(df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop')
log ""
log "Cleanup suggestions:"
log "1. Compress or delete old log files: find /var/log -type f -name '*.log' -mtime +30 -exec gzip {} \;"
log "2. Remove temporary files older than 7 days: find /tmp -type f -atime +7 -delete"
log "3. Clean yum cache: yum clean all"
log "4. Clean apt cache: apt-get clean"
log "5. Prune unused Docker images/containers: docker system prune -a"
log "6. Vacuum journal logs: journalctl --vacuum-time=7d"Script 3: Network connection and port check
Focuses on network interface status, TCP connection statistics, listening ports, DNS and gateway reachability, and kernel network parameters.
#!/bin/bash
################################################################################
# Script name: network_check.sh
# Description: Deep network connection and port inspection
# Version: v1.0
################################################################################
REPORT_FILE="/var/log/network_check_$(date +%Y%m%d_%H%M%S).log"
TIME_WAIT_WARNING=5000
ESTABLISHED_WARNING=10000
CRITICAL_PORTS=(22 80 443 3306 6379 8080)
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
# 1. Interface status
check_network_interfaces() {
log "============================================================"
log "1. Network interface status"
log "============================================================"
log "Interface list:"
ip -br addr | tee -a "$REPORT_FILE"
for interface in $(ip -br link | awk '{print $1}' | grep -v lo); do
log "Interface: $interface"
STATUS=$(ip link show $interface | grep -i "state" | awk '{print $9}')
log " Status: $STATUS"
RX_BYTES=$(cat /sys/class/net/$interface/statistics/rx_bytes 2>/dev/null)
TX_BYTES=$(cat /sys/class/net/$interface/statistics/tx_bytes 2>/dev/null)
log " RX: $(numfmt --to=iec $RX_BYTES 2>/dev/null || echo "${RX_BYTES}B")"
log " TX: $(numfmt --to=iec $TX_BYTES 2>/dev/null || echo "${TX_BYTES}B")"
RX_DROP=$(cat /sys/class/net/$interface/statistics/rx_dropped 2>/dev/null)
TX_DROP=$(cat /sys/class/net/$interface/statistics/tx_dropped 2>/dev/null)
if [ "$RX_DROP" -gt 0 ] || [ "$TX_DROP" -gt 0 ]; then
log_warning "Packet drops on $interface: RX=$RX_DROP TX=$TX_DROP"
fi
RX_ERR=$(cat /sys/class/net/$interface/statistics/rx_errors 2>/dev/null)
TX_ERR=$(cat /sys/class/net/$interface/statistics/tx_errors 2>/dev/null)
if [ "$RX_ERR" -gt 0 ] || [ "$TX_ERR" -gt 0 ]; then
log_error "Errors on $interface: RX=$RX_ERR TX=$TX_ERR"
fi
done
}
# 2. TCP connection statistics
check_tcp_connections() {
log "============================================================"
log "2. TCP connection statistics"
log "============================================================"
log "Connection state distribution:"
netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c | sort -rn | tee -a "$REPORT_FILE"
TIME_WAIT_COUNT=$(netstat -an | grep TIME_WAIT | wc -l)
log "TIME_WAIT connections: $TIME_WAIT_COUNT"
if [ $TIME_WAIT_COUNT -gt $TIME_WAIT_WARNING ]; then
log_warning "Too many TIME_WAIT connections, consider tuning TCP parameters"
log "Optimization suggestions:"
log " echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"
log " echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle # not recommended in NAT environments"
else
log_ok "TIME_WAIT connections normal"
fi
ESTABLISHED_COUNT=$(netstat -an | grep ESTABLISHED | wc -l)
log "ESTABLISHED connections: $ESTABLISHED_COUNT"
if [ $ESTABLISHED_COUNT -gt $ESTABLISHED_WARNING ]; then
log_warning "High number of ESTABLISHED connections"
fi
log "Top 10 remote IPs by ESTABLISHED connections:"
netstat -an | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -10 | tee -a "$REPORT_FILE"
}
# 3. Listening ports check
check_listening_ports() {
log "============================================================"
log "3. Listening ports check"
log "============================================================"
log "Current listening ports:"
netstat -tuln | grep LISTEN | tee -a "$REPORT_FILE"
log "Critical port status check:"
for port in "${CRITICAL_PORTS[@]}"; do
if netstat -tuln | grep -q ":$port "; then
PID=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f1)
PROC=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f2)
log_ok "Port $port listening (process: $PROC PID: $PID)"
else
log_warning "Port $port not listening"
fi
done
}
# 4. Network connectivity check
check_network_connectivity() {
log "============================================================"
log "4. Network connectivity check"
log "============================================================"
log "DNS resolution test:"
if nslookup www.baidu.com >/dev/null 2>&1; then
log_ok "DNS resolution normal"
else
log_error "DNS resolution failed"
fi
GATEWAY=$(ip route | grep default | awk '{print $3}' | head -1)
if [ -n "$GATEWAY" ]; then
log "Default gateway: $GATEWAY"
if ping -c 3 -W 2 $GATEWAY >/dev/null 2>&1; then
log_ok "Gateway reachable"
else
log_error "Gateway unreachable"
fi
else
log_warning "No default gateway found"
fi
if ping -c 3 -W 2 8.8.8.8 >/dev/null 2>&1; then
log_ok "External network reachable"
else
log_error "External network unreachable"
fi
}
# 5. Network kernel parameters
check_network_parameters() {
log "============================================================"
log "5. Network kernel parameters"
log "============================================================"
log "tcp_tw_reuse: $(cat /proc/sys/net/ipv4/tcp_tw_reuse)"
log "tcp_tw_recycle: $(cat /proc/sys/net/ipv4/tcp_tw_recycle 2>/dev/null || echo 'N/A')"
log "tcp_fin_timeout: $(cat /proc/sys/net/ipv4/tcp_fin_timeout)"
log "tcp_keepalive_time: $(cat /proc/sys/net/ipv4/tcp_keepalive_time)"
log "tcp_max_syn_backlog: $(cat /proc/sys/net/ipv4/tcp_max_syn_backlog)"
log "somaxconn: $(cat /proc/sys/net/core/somaxconn)"
log "File descriptor limits: soft=$(ulimit -Sn) hard=$(ulimit -Hn)"
}
# Main function
main() {
echo "=========================================="
echo " Network status inspection script v1.0"
echo "=========================================="
echo ""
check_network_interfaces
check_tcp_connections
check_listening_ports
check_network_connectivity
check_network_parameters
echo ""
log "============================================================"
log "Inspection completed!"
log "Report saved to: $REPORT_FILE"
log "============================================================"
}
main "$@"Script 4: Application process health check
Validates the running state, resource consumption, and interface responsiveness of critical business processes.
#!/bin/bash
################################################################################
# Script name: process_health_check.sh
# Description: Application process health inspection
# Version: v2.0
################################################################################
REPORT_FILE="/var/log/process_check_$(date +%Y%m%d_%H%M%S).log"
# Define critical processes (adjust per business)
declare -A CRITICAL_PROCESSES=(
["nginx"]="nginx|80,443"
["mysql"]="mysqld|3306"
["redis"]="redis-server|6379"
["java"]="java|8080"
)
# Define HTTP endpoints to test
declare -A HTTP_ENDPOINTS=(
["Home"]="http://localhost/"
["Health"]="http://localhost:8080/health"
["API Status"]="http://localhost:8080/api/status"
)
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
# 1. Process check
check_process() {
local name="$1"
local pattern="$2"
log "Checking process: $name"
PIDS=$(pgrep -f "$pattern")
if [ -z "$PIDS" ]; then
log_error "Process not running!"
return 1
fi
PID_COUNT=$(echo "$PIDS" | wc -l)
log_ok "Process running (instances: $PID_COUNT)"
for pid in $PIDS; do
STATUS=$(ps -p $pid -o stat --no-headers)
if [[ $STATUS == *Z* ]]; then
log_error "PID $pid: zombie state"
continue
elif [[ $STATUS == *T* ]]; then
log_warning "PID $pid: stopped state"
continue
fi
CPU=$(ps -p $pid -o %cpu --no-headers | xargs)
MEM=$(ps -p $pid -o %mem --no-headers | xargs)
RSS=$(ps -p $pid -o rss --no-headers | xargs)
VSZ=$(ps -p $pid -o vsz --no-headers | xargs)
UPTIME=$(ps -p $pid -o etime --no-headers | xargs)
THREADS=$(ps -p $pid -o nlwp --no-headers | xargs)
log "PID $pid:"
log " CPU: ${CPU}%"
log " Memory: ${MEM}% (RSS:${RSS}KB VSZ:${VSZ}KB)"
log " Threads: $THREADS"
log " Uptime: $UPTIME"
if (( $(echo "$CPU > 80" | bc -l) )); then
log_warning "CPU usage high!"
fi
if (( $(echo "$MEM > 80" | bc -l) )); then
log_warning "Memory usage high!"
fi
if [ $THREADS -gt 1000 ]; then
log_warning "Thread count high!"
fi
if [ -d "/proc/$pid/fd" ]; then
FD_COUNT=$(ls /proc/$pid/fd 2>/dev/null | wc -l)
log " File descriptors: $FD_COUNT"
if [ $FD_COUNT -gt 10000 ]; then
log_warning "Too many file descriptors!"
fi
fi
if [ -d "/proc/$pid" ]; then
CORE_PATTERN=$(cat /proc/sys/kernel/core_pattern)
log " Core dump config: $CORE_PATTERN"
fi
done
return 0
}
# 2. Port check for a service
check_ports() {
local name="$1"
local ports="$2"
log "Checking ports for: $name"
IFS=',' read -ra PORT_ARRAY <<< "$ports"
for port in "${PORT_ARRAY[@]}"; do
if netstat -tuln | grep -q ":$port "; then
PID=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f1)
PROC=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f2)
log_ok "Port $port listening (process: $PROC PID: $PID)"
ESTABLISHED=$(netstat -an | grep ":$port " | grep ESTABLISHED | wc -l)
TIME_WAIT=$(netstat -an | grep ":$port " | grep TIME_WAIT | wc -l)
CLOSE_WAIT=$(netstat -an | grep ":$port " | grep CLOSE_WAIT | wc -l)
log " ESTABLISHED: $ESTABLISHED"
log " TIME_WAIT: $TIME_WAIT"
log " CLOSE_WAIT: $CLOSE_WAIT"
if [ $CLOSE_WAIT -gt 100 ]; then
log_warning "CLOSE_WAIT connections high, possible leak!"
fi
else
log_error "Port $port not listening"
fi
done
}
# 3. HTTP endpoint health check
check_http_endpoints() {
log ""
log "============================================================"
log "HTTP endpoint health check"
log "============================================================"
for name in "${!HTTP_ENDPOINTS[@]}"; do
url="${HTTP_ENDPOINTS[$name]}"
log "Checking: $name ($url)"
START_TIME=$(date +%s%N)
RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "$url" 2>&1)
END_TIME=$(date +%s%N)
if [ $? -eq 0 ]; then
HTTP_CODE=$RESPONSE
RESPONSE_TIME=$(echo "scale=3; ($END_TIME - $START_TIME) / 1000000000" | bc)
if [ "$HTTP_CODE" == "200" ]; then
log_ok "HTTP $HTTP_CODE response time: ${RESPONSE_TIME}s"
elif [ "$HTTP_CODE" == "301" ] || [ "$HTTP_CODE" == "302" ]; then
log_warning "HTTP $HTTP_CODE (redirect)"
else
log_error "HTTP $HTTP_CODE (abnormal)"
fi
if (( $(echo "$RESPONSE_TIME > 3" | bc -l) )); then
log_warning "Response time too long: ${RESPONSE_TIME}s"
fi
else
log_error "Request failed – timeout or network error"
fi
done
}
# 4. Process dependency check (shared libraries)
check_process_dependencies() {
log ""
log "============================================================"
log "Process dependency check"
log "============================================================"
log "Shared library dependencies:"
for proc in "${!CRITICAL_PROCESSES[@]}"; do
pattern=$(echo "${CRITICAL_PROCESSES[$proc]}" | cut -d'|' -f1)
PID=$(pgrep -f "$pattern" | head -1)
if [ -n "$PID" ]; then
EXE=$(readlink /proc/$PID/exe 2>/dev/null)
if [ -n "$EXE" ]; then
log " $proc ($EXE):"
MISSING=$(ldd "$EXE" 2>/dev/null | grep "not found")
if [ -n "$MISSING" ]; then
log_error " Missing dependencies:"
echo "$MISSING" | awk '{print " " $0}' | tee -a "$REPORT_FILE"
else
log_ok " Dependencies complete"
fi
fi
fi
done
}
# Main function
main() {
echo "=========================================="
echo " Application process health check v2.0"
echo "=========================================="
echo ""
log "============================================================"
log "Process and port check"
log "============================================================"
for proc in "${!CRITICAL_PROCESSES[@]}"; do
IFS='|' read -r pattern ports <<< "${CRITICAL_PROCESSES[$proc]}"
check_process "$proc" "$pattern"
check_ports "$proc" "$ports"
log ""
done
check_http_endpoints
check_process_dependencies
log ""
log "============================================================"
log "Inspection completed!"
log "Report saved to: $REPORT_FILE"
log "============================================================"
}
main "$@"Script 5: Security baseline check
Inspects account security, SSH configuration, critical file permissions, SUID binaries, firewall status, SELinux state, system updates, and sensitive command history.
#!/bin/bash
################################################################################
# Script name: security_baseline_check.sh
# Description: Server security baseline inspection
# Version: v1.0
################################################################################
REPORT_FILE="/var/log/security_check_$(date +%Y%m%d_%H%M%S).log"
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
# 1. Account security check
check_account_security() {
log "============================================================"
log "1. Account security check"
log "============================================================"
log "Empty password accounts:"
EMPTY_PASS=$(awk -F: '($2 == "") {print $1}' /etc/shadow 2>/dev/null)
if [ -n "$EMPTY_PASS" ]; then
log_error "Found empty‑password accounts:"
echo "$EMPTY_PASS" | awk '{print " "$0}' | tee -a "$REPORT_FILE"
else
log_ok "No empty‑password accounts"
fi
log "UID 0 accounts (root privileges):"
ROOT_USERS=$(awk -F: '($3 == 0) {print $1}' /etc/passwd)
echo "$ROOT_USERS" | awk '{print " "$0}' | tee -a "$REPORT_FILE"
if [ $(echo "$ROOT_USERS" | wc -l) -gt 1 ]; then
log_warning "Multiple UID 0 accounts detected!"
fi
log "Login‑capable system accounts (UID<1000, shell not nologin):"
awk -F: '($3 < 1000 && $7 !~ /nologin|false/) {print " "$1" (UID:"$3" Shell:"$7")"}' /etc/passwd | tee -a "$REPORT_FILE"
log "Users with sudo privileges:"
grep -E '^%wheel|^%sudo|^[^#].*ALL=.*ALL' /etc/sudoers /etc/sudoers.d/* 2>/dev/null | awk '{print " "$0}' | tee -a "$REPORT_FILE"
log "Recent login failures (last 10):"
lastb | head -10 | awk '{print " "$0}' | tee -a "$REPORT_FILE"
}
# 2. SSH security configuration check
check_ssh_security() {
log ""
log "============================================================"
log "2. SSH security configuration check"
log "============================================================"
SSHD_CONFIG="/etc/ssh/sshd_config"
if [ ! -f "$SSHD_CONFIG" ]; then
log_error "SSH config file missing"
return 1
fi
ROOT_LOGIN=$(grep "^PermitRootLogin" $SSHD_CONFIG | awk '{print $2}')
log "PermitRootLogin: $ROOT_LOGIN"
if [ "$ROOT_LOGIN" == "yes" ]; then
log_warning "SSH permits root login (insecure)"
else
log_ok "SSH root login disabled"
fi
PASS_AUTH=$(grep "^PasswordAuthentication" $SSHD_CONFIG | awk '{print $2}')
log "PasswordAuthentication: $PASS_AUTH"
if [ "$PASS_AUTH" == "yes" ]; then
log_warning "Password authentication enabled (use keys)"
else
log_ok "Password authentication disabled"
fi
EMPTY_PASS=$(grep "^PermitEmptyPasswords" $SSHD_CONFIG | awk '{print $2}')
log "PermitEmptyPasswords: ${EMPTY_PASS:-no}"
if [ "$EMPTY_PASS" == "yes" ]; then
log_error "Empty passwords allowed (high risk)"
else
log_ok "Empty passwords disabled"
fi
SSH_PORT=$(grep "^Port" $SSHD_CONFIG | awk '{print $2}')
log "SSH port: ${SSH_PORT:-22}"
if [ "$SSH_PORT" == "22" ] || [ -z "$SSH_PORT" ]; then
log_warning "Default SSH port 22 in use (consider changing)"
else
log_ok "Custom SSH port configured"
fi
MAX_AUTH=$(grep "^MaxAuthTries" $SSHD_CONFIG | awk '{print $2}')
log "MaxAuthTries: ${MAX_AUTH:-6}"
ALLOW_USERS=$(grep "^AllowUsers" $SSHD_CONFIG | cut -d' ' -f2-)
if [ -n "$ALLOW_USERS" ]; then
log "AllowUsers: $ALLOW_USERS"
else
log_warning "AllowUsers not set (all users can attempt login)"
fi
}
# 3. Critical file permission check
check_file_permissions() {
log ""
log "============================================================"
log "3. Critical file permission check"
log "============================================================"
declare -A CRITICAL_FILES=(
["/etc/passwd"]="644"
["/etc/shadow"]="000"
["/etc/group"]="644"
["/etc/gshadow"]="000"
["/etc/ssh/sshd_config"]="600"
)
for file in "${!CRITICAL_FILES[@]}"; do
if [ -f "$file" ]; then
PERM=$(stat -c "%a" "$file")
EXPECTED="${CRITICAL_FILES[$file]}"
log "$file: $PERM"
if [ "$PERM" != "$EXPECTED" ]; then
log_warning "Incorrect permission (expected: $EXPECTED)"
else
log_ok "Permission correct"
fi
else
log_warning "$file does not exist"
fi
done
log "SUID files (potential privilege escalation):"
find / -type f -perm -4000 2>/dev/null | head -20 | awk '{print " "$0}' | tee -a "$REPORT_FILE"
}
# 4. Firewall check
check_firewall() {
log ""
log "============================================================"
log "4. Firewall status check"
log "============================================================"
if command -v iptables >/dev/null 2>&1; then
RULE_COUNT=$(iptables -L -n | wc -l)
log "iptables rule count: $RULE_COUNT"
if [ $RULE_COUNT -lt 10 ]; then
log_warning "Few iptables rules, firewall may be misconfigured"
fi
fi
if systemctl is-active --quiet firewalld 2>/dev/null; then
log_ok "firewalld: running"
else
log_warning "firewalld: not running"
fi
if command -v getenforce >/dev/null 2>&1; then
SELINUX=$(getenforce)
log "SELinux: $SELINUX"
if [ "$SELINUX" == "Disabled" ]; then
log_warning "SELinux disabled (reduced security)"
fi
fi
}
# 5. System update check
check_system_updates() {
log ""
log "============================================================"
log "5. System update check"
log "============================================================"
if command -v yum >/dev/null 2>&1; then
UPDATE_COUNT=$(yum check-update 2>/dev/null | grep -v "^$" | wc -l)
log "Available updates (yum): $UPDATE_COUNT"
if [ $UPDATE_COUNT -gt 50 ]; then
log_warning "Many pending updates, schedule regular patching"
fi
fi
if command -v apt-get >/dev/null 2>&1; then
apt-get update >/dev/null 2>&1
UPDATE_COUNT=$(apt list --upgradable 2>/dev/null | wc -l)
log "Available updates (apt): $UPDATE_COUNT"
fi
KERNEL=$(uname -r)
log "Current kernel version: $KERNEL"
}
# 6. Sensitive command history check
check_history() {
log ""
log "============================================================"
log "6. Sensitive command history check"
log "============================================================"
if [ -f ~/.bash_history ]; then
log "Dangerous commands found:"
grep -E "rm -rf|mkfs|dd|:/dev/|wget.*sh|curl.*sh|chmod 777" ~/.bash_history 2>/dev/null | tail -10 | awk '{print " "$0}' | tee -a "$REPORT_FILE"
fi
log "Possible password leaks in history:"
grep -E "password=|passwd|mysql.*-p" ~/.bash_history 2>/dev/null | tail -5 | awk '{print " "$0}' | tee -a "$REPORT_FILE"
}
# Main function
main() {
echo "=========================================="
echo " Security baseline check script v1.0"
echo "=========================================="
echo ""
if [ "$(id -u)" -ne 0 ]; then
log_warning "Non‑root execution, some checks may be limited"
fi
check_account_security
check_ssh_security
check_file_permissions
check_firewall
check_system_updates
check_history
log ""
log "============================================================"
log "Inspection completed!"
log "Report saved to: $REPORT_FILE"
log "============================================================"
}
main "$@"Script 6: MySQL health check
Evaluates MySQL connection, connection usage, query performance, InnoDB status, replication health, and table space consumption.
#!/bin/bash
################################################################################
# Script name: mysql_health_check.sh
# Description: MySQL database health inspection
# Version: v1.0
################################################################################
REPORT_FILE="/var/log/mysql_check_$(date +%Y%m%d_%H%M%S).log"
# MySQL connection configuration (adjust as needed)
MYSQL_USER="monitor"
MYSQL_PASS="your_password"
MYSQL_HOST="localhost"
MYSQL_PORT="3306"
# Alert thresholds
CONN_WARNING=100
SLOW_QUERY_WARNING=100
THREAD_RUNNING_WARNING=10
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'
log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }
mysql_exec() { mysql -u"$MYSQL_USER" -p"$MYSQL_PASS" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -e "$1" 2>/dev/null; }
# 1. Connection check
check_mysql_connection() {
log "============================================================"
log "1. MySQL connection check"
log "============================================================"
if mysql_exec "SELECT 1" >/dev/null 2>&1; then
log_ok "MySQL connection OK"
VERSION=$(mysql_exec "SELECT VERSION()" | tail -1)
log "MySQL version: $VERSION"
UPTIME=$(mysql_exec "SHOW STATUS LIKE 'Uptime'" | tail -1 | awk '{print $2}')
UPTIME_DAYS=$(echo "scale=2; $UPTIME / 86400" | bc)
log "Uptime: ${UPTIME_DAYS} days"
return 0
else
log_error "MySQL connection failed"
return 1
fi
}
# 2. Connection usage check
check_mysql_connections() {
log ""
log "============================================================"
log "2. Connection usage check"
log "============================================================"
THREADS_CONNECTED=$(mysql_exec "SHOW STATUS LIKE 'Threads_connected'" | tail -1 | awk '{print $2}')
log "Current connections: $THREADS_CONNECTED"
MAX_CONNECTIONS=$(mysql_exec "SHOW VARIABLES LIKE 'max_connections'" | tail -1 | awk '{print $2}')
log "Max connections: $MAX_CONNECTIONS"
CONN_USAGE=$(echo "scale=2; $THREADS_CONNECTED / $MAX_CONNECTIONS * 100" | bc)
log "Connection usage: ${CONN_USAGE}%"
if (( $(echo "$CONN_USAGE > 80" | bc -l) )); then
log_warning "Connection usage high!"
else
log_ok "Connection usage normal"
fi
MAX_USED=$(mysql_exec "SHOW STATUS LIKE 'Max_used_connections'" | tail -1 | awk '{print $2}')
log "Historical max connections: $MAX_USED"
THREADS_RUNNING=$(mysql_exec "SHOW STATUS LIKE 'Threads_running'" | tail -1 | awk '{print $2}')
log "Running threads: $THREADS_RUNNING"
if [ "$THREADS_RUNNING" -gt $THREAD_RUNNING_WARNING ]; then
log_warning "Many running threads, possible slow queries"
fi
}
# 3. Query performance check
check_query_performance() {
log ""
log "============================================================"
log "3. Query performance check"
log "============================================================"
SLOW_QUERIES=$(mysql_exec "SHOW STATUS LIKE 'Slow_queries'" | tail -1 | awk '{print $2}')
log "Slow queries: $SLOW_QUERIES"
if [ "$SLOW_QUERIES" -gt $SLOW_QUERY_WARNING ]; then
log_warning "Many slow queries!"
SLOW_QUERY_TIME=$(mysql_exec "SHOW VARIABLES LIKE 'long_query_time'" | tail -1 | awk '{print $2}')
log "Slow query threshold: ${SLOW_QUERY_TIME}s"
fi
QUESTIONS=$(mysql_exec "SHOW STATUS LIKE 'Questions'" | tail -1 | awk '{print $2}')
UPTIME=$(mysql_exec "SHOW STATUS LIKE 'Uptime'" | tail -1 | awk '{print $2}')
QPS=$(echo "scale=2; $QUESTIONS / $UPTIME" | bc)
log "Average QPS: $QPS"
COM_COMMIT=$(mysql_exec "SHOW STATUS LIKE 'Com_commit'" | tail -1 | awk '{print $2}')
COM_ROLLBACK=$(mysql_exec "SHOW STATUS LIKE 'Com_rollback'" | tail -1 | awk '{print $2}')
TPS=$(echo "scale=2; ($COM_COMMIT + $COM_ROLLBACK) / $UPTIME" | bc)
log "Average TPS: $TPS"
}
# 4. InnoDB status check
check_innodb_status() {
log ""
log "============================================================"
log "4. InnoDB storage engine check"
log "============================================================"
INNODB_BUFFER_POOL=$(mysql_exec "SHOW VARIABLES LIKE 'innodb_buffer_pool_size'" | tail -1 | awk '{print $2}')
INNODB_BUFFER_GB=$(echo "scale=2; $INNODB_BUFFER_POOL / 1024 / 1024 / 1024" | bc)
log "InnoDB buffer pool size: ${INNODB_BUFFER_GB}GB"
BUFFER_READS=$(mysql_exec "SHOW STATUS LIKE 'Innodb_buffer_pool_reads'" | tail -1 | awk '{print $2}')
BUFFER_READ_REQUESTS=$(mysql_exec "SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests'" | tail -1 | awk '{print $2}')
if [ "$BUFFER_READ_REQUESTS" -gt 0 ]; then
HIT_RATIO=$(echo "scale=4; (1 - $BUFFER_READS / $BUFFER_READ_REQUESTS) * 100" | bc)
log "Buffer pool hit ratio: ${HIT_RATIO}%"
if (( $(echo "$HIT_RATIO < 99" | bc -l) )); then
log_warning "Low buffer pool hit ratio, consider increasing innodb_buffer_pool_size"
else
log_ok "Buffer pool hit ratio good"
fi
fi
INNODB_LOG_SIZE=$(mysql_exec "SHOW VARIABLES LIKE 'innodb_log_file_size'" | tail -1 | awk '{print $2}')
INNODB_LOG_MB=$(echo "scale=2; $INNODB_LOG_SIZE / 1024 / 1024" | bc)
log "InnoDB log file size: ${INNODB_LOG_MB}MB"
}
# 5. Replication status check
check_replication_status() {
log ""
log "============================================================"
log "5. Master‑slave replication status check"
log "============================================================"
SLAVE_STATUS=$(mysql_exec "SHOW SLAVE STATUS\G" 2>/dev/null)
if [ -n "$SLAVE_STATUS" ]; then
log "Slave replication status:"
SLAVE_IO_RUNNING=$(echo "$SLAVE_STATUS" | grep "Slave_IO_Running" | awk '{print $2}')
log " Slave_IO_Running: $SLAVE_IO_RUNNING"
SLAVE_SQL_RUNNING=$(echo "$SLAVE_STATUS" | grep "Slave_SQL_Running" | awk '{print $2}' | head -1)
log " Slave_SQL_Running: $SLAVE_SQL_RUNNING"
if [ "$SLAVE_IO_RUNNING" != "Yes" ] || [ "$SLAVE_SQL_Running" != "Yes" ]; then
log_error "Replication abnormal!"
LAST_IO_ERROR=$(echo "$SLAVE_STATUS" | grep "Last_IO_Error:" | cut -d':' -f2-)
LAST_SQL_ERROR=$(echo "$SLAVE_STATUS" | grep "Last_SQL_Error:" | cut -d':' -f2-)
[ -n "$LAST_IO_ERROR" ] && log " IO error: $LAST_IO_ERROR"
[ -n "$LAST_SQL_ERROR" ] && log " SQL error: $LAST_SQL_ERROR"
else
log_ok "Replication normal"
fi
SECONDS_BEHIND=$(echo "$SLAVE_STATUS" | grep "Seconds_Behind_Master" | awk '{print $2}')
log " Replication lag: ${SECONDS_BEHIND}s"
if [ "$SECONDS_BEHIND" != "NULL" ] && [ "$SECONDS_BEHIND" -gt 60 ]; then
log_warning "Replication lag large!"
fi
else
log "Not a slave or replication not configured"
fi
}
# 6. Tablespace check
check_tablespace() {
log ""
log "============================================================"
log "6. Tablespace check"
log "============================================================"
log "Top 10 tables by size:"
mysql_exec "SELECT table_schema AS 'Database', table_name AS 'Table', ROUND((data_length + index_length) / 1024 / 1024, 2) AS 'Size(MB)' FROM information_schema.tables WHERE table_schema NOT IN ('mysql','information_schema','performance_schema','sys') ORDER BY (data_length + index_length) DESC LIMIT 10;" | tee -a "$REPORT_FILE"
}
# Main function
main() {
echo "=========================================="
echo " MySQL health check script v1.0"
echo "=========================================="
echo ""
if ! check_mysql_connection; then
log_error "Cannot connect to MySQL, aborting"
exit 1
fi
check_mysql_connections
check_query_performance
check_innodb_status
check_replication_status
check_tablespace
log ""
log "============================================================"
log "Inspection completed!"
log "Report saved to: $REPORT_FILE"
log "============================================================"
}
main "$@"Script 7: Batch server inspection scheduler
Runs the previous scripts concurrently on multiple servers, aggregates results, and generates an HTML summary report.
#!/bin/bash
################################################################################
# Script name: batch_check_scheduler.sh
# Description: Batch server inspection scheduler
# Version: v2.0
################################################################################
# Configuration area
SERVER_LIST="servers.txt" # Server list file
SSH_USER="root" # SSH user
SSH_KEY="~/.ssh/id_rsa" # SSH key
CONCURRENT=10 # Number of concurrent jobs
CHECK_SCRIPT="system_health_check.sh" # Local inspection script
RESULT_DIR="/var/log/batch_check_$(date +%Y%m%d_%H%M%S)"
mkdir -p "$RESULT_DIR"
GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[1;33m'
NC='\033[0m'
log_info() { echo -e "[$(date +"%H:%M:%S")] ${GREEN}[INFO]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }
log_error() { echo -e "[$(date +"%H:%M:%S")] ${RED}[ERROR]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }
log_warning() { echo -e "[$(date +"%H:%M:%S")] ${YELLOW}[WARNING]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }
# Single server inspection
check_single_server() {
local server=$1
local result_file="$RESULT_DIR/${server}_result.log"
local status_file="$RESULT_DIR/${server}_status.txt"
log_info "Starting check: $server"
# SSH connectivity test
if ! ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "echo ok" >/dev/null 2>&1; then
log_error "$server - SSH connection failed"
echo "FAILED|SSH connection failed" > "$status_file"
return 1
fi
# Upload inspection script
scp -o StrictHostKeyChecking=no -i "$SSH_KEY" "$CHECK_SCRIPT" "$SSH_USER@$server:/tmp/" >/dev/null 2>&1
if [ $? -ne 0 ]; then
log_error "$server - Script upload failed"
echo "FAILED|Script upload failed" > "$status_file"
return 1
fi
# Execute inspection
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "bash /tmp/$CHECK_SCRIPT" > "$result_file" 2>&1
if [ $? -eq 0 ]; then
WARNING_COUNT=$(grep -c "\[WARNING\]" "$result_file" || echo 0)
ERROR_COUNT=$(grep -c "\[ERROR\]" "$result_file" || echo 0)
if [ $ERROR_COUNT -gt 0 ]; then
echo "ERROR|Warnings:$WARNING_COUNT Errors:$ERROR_COUNT" > "$status_file"
log_error "$server - Detected $ERROR_COUNT errors"
elif [ $WARNING_COUNT -gt 0 ]; then
echo "WARNING|Warnings:$WARNING_COUNT Errors:$ERROR_COUNT" > "$status_file"
log_warning "$server - Detected $WARNING_COUNT warnings"
else
echo "OK|Normal" > "$status_file"
log_info "$server - Check completed, status normal"
fi
# Clean remote script
ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "rm -f /tmp/$CHECK_SCRIPT" >/dev/null 2>&1
return 0
else
log_error "$server - Inspection execution failed"
echo "FAILED|Execution failed" > "$status_file"
return 1
fi
}
# Concurrent batch check
batch_check() {
if [ ! -f "$SERVER_LIST" ]; then
log_error "Server list file not found: $SERVER_LIST"
exit 1
fi
if [ ! -f "$CHECK_SCRIPT" ]; then
log_error "Inspection script not found: $CHECK_SCRIPT"
exit 1
fi
total=$(grep -v "^#" "$SERVER_LIST" | grep -v "^$" | wc -l)
log_info "Starting batch inspection of $total servers with concurrency $CONCURRENT"
while read server; do
[[ -z "$server" || "$server" =~ ^# ]] && continue
while [ $(jobs -r | wc -l) -ge $CONCURRENT ]; do sleep 1; done
check_single_server "$server" &
done < "$SERVER_LIST"
wait
log_info "All server inspections completed!"
}
# Generate HTML summary report
generate_html_report() {
local summary_file="$RESULT_DIR/summary.html"
log_info "Generating summary report..."
cat > "$summary_file" <<'EOF'
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Server Inspection Summary Report</title>
<style>
* { margin: 0; padding: 0; box-sizing: border-box; }
body { font-family: 'Arial', 'Microsoft YaHei', sans-serif; background: #f5f5f5; padding: 20px; }
.container { max-width: 1200px; margin: 0 auto; background: white; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
.header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 30px; border-radius: 8px 8px 0 0; }
.header h1 { font-size: 28px; margin-bottom: 10px; }
.header p { opacity: 0.9; font-size: 14px; }
.stats { display: flex; padding: 20px; background: #f8f9fa; border-bottom: 1px solid #e9ecef; }
.stat-item { flex: 1; text-align: center; padding: 10px; }
.stat-item .number { font-size: 32px; font-weight: bold; margin-bottom: 5px; }
.stat-item .label { color: #6c757d; font-size: 14px; }
.stat-ok .number { color: #28a745; }
.stat-warning .number { color: #ffc107; }
.stat-error .number { color: #dc3545; }
.content { padding: 20px; }
table { width: 100%; border-collapse: collapse; margin-top: 20px; }
th, td { padding: 12px; text-align: left; border-bottom: 1px solid #e9ecef; }
th { background: #f8f9fa; font-weight: 600; color: #495057; position: sticky; top: 0; }
tr:hover { background: #f8f9fa; }
.status-ok { background: #d4edda; color: #155724; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
.status-warning { background: #fff3cd; color: #856404; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
.status-error { background: #f8d7da; color: #721c24; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
.status-failed { background: #6c757d; color: white; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
a { color: #007bff; text-decoration: none; }
a:hover { text-decoration: underline; }
.footer { text-align: center; padding: 20px; color: #6c757d; font-size: 12px; border-top: 1px solid #e9ecef; }
</style>
</head>
<body>
<div class="container">
<div class="header">
<h1>🖥️ Server Inspection Summary Report</h1>
<p>Inspection time: REPORT_TIME</p>
</div>
<div class="stats">
<div class="stat-item stat-ok"><div class="number" id="ok-count">0</div><div class="label">Normal</div></div>
<div class="stat-item stat-warning"><div class="number" id="warning-count">0</div><div class="label">Warnings</div></div>
<div class="stat-item stat-error"><div class="number" id="error-count">0</div><div class="label">Errors</div></div>
<div class="stat-item stat-error"><div class="number" id="failed-count">0</div><div class="label">Failed</div></div>
</div>
<div class="content">
<table>
<thead>
<tr><th>#</th><th>Server</th><th>Status</th><th>Details</th><th>Full report</th></tr>
</thead>
<tbody id="server-list">
</tbody>
</table>
</div>
<div class="footer">Generated by Batch Check Scheduler v2.0</div>
</div>
<script>
let okCount = 0, warningCount = 0, errorCount = 0, failedCount = 0;
const serverData = SERVER_DATA_PLACEHOLDER;
const tbody = document.getElementById('server-list');
serverData.forEach((item, index) => {
const row = tbody.insertRow();
row.innerHTML = `
<td>${index + 1}</td>
<td>${item.server}</td>
<td><span class="status-${item.status.toLowerCase()}">${item.status}</span></td>
<td>${item.detail}</td>
<td><a href="${item.report}" target="_blank">View details</a></td>
`;
if (item.status === 'OK') okCount++;
else if (item.status === 'WARNING') warningCount++;
else if (item.status === 'ERROR') errorCount++;
else failedCount++;
});
document.getElementById('ok-count').textContent = okCount;
document.getElementById('warning-count').textContent = warningCount;
document.getElementById('error-count').textContent = errorCount;
document.getElementById('failed-count').textContent = failedCount;
</script>
</body>
</html>
EOF
# Collect data from status files
local server_data="["
local first=true
for status_file in "$RESULT_DIR"/*_status.txt; do
[ -f "$status_file" ] || continue
server=$(basename "$status_file" _status.txt)
IFS='|' read status detail < "$status_file"
if $first; then first=false; else server_data+=","; fi
server_data+="{\"server\":\"$server\",\"status\":\"$status\",\"detail\":\"$detail\",\"report\":\"${server}_result.log\"}"
done
server_data+="]"
sed -i "s/REPORT_TIME/$(date +"%Y-%m-%d %H:%M:%S")/g" "$summary_file"
sed -i "s/SERVER_DATA_PLACEHOLDER/$server_data/g" "$summary_file"
log_info "Summary report generated: $summary_file"
}
# Main entry point
main() {
echo "=========================================="
echo " Batch server inspection scheduler v2.0"
echo "=========================================="
echo ""
batch_check
generate_html_report
echo ""
echo "=========================================="
echo " Inspection finished!"
echo " Result directory: $RESULT_DIR"
echo " Summary report: $RESULT_DIR/summary.html"
echo "=========================================="
}
main "$@"Conclusion
The seven scripts together form a comprehensive, low‑cost automation framework for proactive server health monitoring, security compliance, and operational stability. By integrating them into scheduled tasks and the batch scheduler, organizations can shift from reactive firefighting to proactive operations, reduce downtime, and improve overall system reliability.
Future enhancements may include JSON output for API integration, AI‑driven anomaly detection, container‑native adaptations, and tighter coupling with existing monitoring platforms such as Zabbix or Prometheus.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
