Operations 72 min read

7 Ready‑to‑Use Shell Scripts for Automated Server Health Checks

This article presents seven production‑grade Bash scripts that automate server health inspections—including system resource monitoring, disk analysis, network diagnostics, process validation, security baseline checks, MySQL health assessment, and a batch scheduler—plus best‑practice guidance for integrating them into an operations workflow.

Ops Community
Ops Community
Ops Community
7 Ready‑to‑Use Shell Scripts for Automated Server Health Checks

Introduction

The author shares a set of seven production‑level Shell scripts that automate server health inspections, enabling proactive detection of issues before they cause service interruptions.

Why automation is needed

Traditional monitoring tools such as Zabbix or Prometheus have blind spots; they may miss configuration omissions, gradual metric drifts, fine‑grained system details, business‑logic failures, or monitoring system outages. Automated scripts complement monitoring by performing systematic checks.

Core benefits of automated inspection

Proactive discovery : Detect problems without relying on alert rules.

Comprehensiveness : Cover system, network, application, and log dimensions.

Customizability : Tailor checks to specific business needs.

Traceability : Generate detailed reports for root‑cause analysis.

Low cost : Pure Shell scripts require no extra agents.

Offline capability : Continue operating even if the monitoring system fails.

Enterprise‑level inspection scenarios

Typical daily operations include scheduled health checks at 8 am, pre‑event comprehensive checks (e.g., before Double‑11 sales), post‑deployment verification, emergency manual diagnostics, and audit/compliance inspections. A real‑world example shows a financial company avoiding a 5 million‑yuan outage by expanding disk space after a batch check.

Core content: 7 production‑grade inspection scripts

Script 1: System health check

This foundational script examines CPU, memory, disk, network, and process usage, logs warnings, and produces a colored, timestamped report.

#!/bin/bash
################################################################################
# Script name: system_health_check.sh
# Description: Comprehensive system health inspection
# Version: v2.0
# Author: DevOps Team
# Last modified: 2025-01-15
################################################################################

# Configuration area
HOSTNAME=$(hostname)
DATE=$(date +"%Y-%m-%d %H:%M:%S")
REPORT_FILE="/var/log/system_check_$(date +%Y%m%d_%H%M%S).log"

# Alert thresholds
CPU_WARNING=80        # CPU usage warning threshold (%)
MEM_WARNING=85        # Memory usage warning threshold (%)
DISK_WARNING=85       # Disk usage warning threshold (%)
LOAD_WARNING=4        # System load warning threshold (adjust per CPU cores)
INODE_WARNING=80      # Inode usage warning threshold (%)

# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# Logging functions
log() {
  echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"
}
log_section() {
  echo "" | tee -a "$REPORT_FILE"
  echo "============================================================" | tee -a "$REPORT_FILE"
  echo " $1" | tee -a "$REPORT_FILE"
  echo "============================================================" | tee -a "$REPORT_FILE"
}
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error()   { echo -e "${RED}[ERROR] $1${NC}"   | tee -a "$REPORT_FILE"; }
log_ok()      { echo -e "${GREEN}[OK] $1${NC}"    | tee -a "$REPORT_FILE"; }

# 1. Basic information collection
check_basic_info() {
    log_section "1. System basic information"
    log "Hostname: $HOSTNAME"
    log "Check time: $DATE"
    log "OS version: $(cat /etc/redhat-release 2>/dev/null || cat /etc/issue | head -1)"
    log "Kernel version: $(uname -r)"
    log "Architecture: $(uname -m)"
    log "Uptime: $(uptime | awk -F'up ' '{print $2}' | awk -F',' '{print $1}')"
    log "Current user: $(whoami)"
    log "Logged‑in users: $(who | wc -l)"
}

# 2. CPU usage check
check_cpu() {
    log_section "2. CPU usage check"
    CPU_CORES=$(grep -c ^processor /proc/cpuinfo)
    log "CPU cores: $CPU_CORES"
    CPU_IDLE=$(top -bn2 -d 1 | grep "Cpu(s)" | tail -1 | awk '{print $8}' | cut -d'%' -f1)
    CPU_USAGE=$(echo "scale=2; 100 - $CPU_IDLE" | bc)
    log "CPU usage: ${CPU_USAGE}%"
    if (( $(echo "$CPU_USAGE > $CPU_WARNING" | bc -l) )); then
        log_warning "CPU usage exceeds $CPU_WARNING%, current $CPU_USAGE%"
        log "Top 5 CPU‑hungry processes:"
        ps aux | sort -rn -k3 | head -5 | awk '{printf "  PID: %-8s User: %-10s CPU: %-6s CMD: %s
", $2,$1,$3,$11}' | tee -a "$REPORT_FILE"
    else
        log_ok "CPU usage normal: $CPU_USAGE%"
    fi
    LOAD_1=$(uptime | awk -F'load average:' '{print $2}' | awk -F',' '{print $1}' | xargs)
    log "Load (1 min): $LOAD_1"
    LOAD_THRESHOLD=$(echo "$CPU_CORES * 2" | bc)
    if (( $(echo "$LOAD_1 > $LOAD_THRESHOLD" | bc -l) )); then
        log_warning "Load too high! 1‑min load $LOAD_1 exceeds threshold $LOAD_THRESHOLD"
    fi
}

# 3. Memory usage check
check_memory() {
    log_section "3. Memory usage check"
    MEM_TOTAL=$(free -m | awk 'NR==2{print $2}')
    MEM_USED=$(free -m | awk 'NR==2{print $3}')
    MEM_AVAILABLE=$(free -m | awk 'NR==2{print $7}')
    MEM_USAGE=$(echo "scale=2; $MEM_USED / $MEM_TOTAL * 100" | bc)
    log "Total memory: ${MEM_TOTAL}MB"
    log "Used memory: ${MEM_USED}MB"
    log "Available memory: ${MEM_AVAILABLE}MB"
    log "Memory usage: ${MEM_USAGE}%"
    if (( $(echo "$MEM_USAGE > $MEM_WARNING" | bc -l) )); then
        log_warning "Memory usage exceeds $MEM_WARNING%, current $MEM_USAGE%"
        log "Top 5 memory‑hungry processes:"
        ps aux | sort -rn -k4 | head -5 | awk '{printf "  PID: %-8s User: %-10s MEM: %-6s CMD: %s
", $2,$1,$4,$11}' | tee -a "$REPORT_FILE"
    else
        log_ok "Memory usage normal: $MEM_USAGE%"
    fi
    SWAP_TOTAL=$(free -m | awk 'NR==3{print $2}')
    SWAP_USED=$(free -m | awk 'NR==3{print $3}')
    log "Swap total: ${SWAP_TOTAL}MB"
    log "Swap used: ${SWAP_USED}MB"
    if [ "$SWAP_TOTAL" -gt 0 ] && [ "$SWAP_USED" -gt 100 ]; then
        log_warning "High swap usage: $SWAP_USEDMB, possible memory pressure"
    fi
}

# 4. Disk usage check
check_disk() {
    log_section "4. Disk usage check"
    log "Disk partition usage:"
    df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop' | awk '{print "  " $0}' | tee -a "$REPORT_FILE"
    HAS_DISK_WARNING=0
    while read line; do
        USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
        MOUNT=$(echo "$line" | awk '{print $6}')
        if [ "$USAGE" -gt $DISK_WARNING ]; then
            log_warning "Partition $MOUNT usage $USAGE% exceeds $DISK_WARNING%"
            HAS_DISK_WARNING=1
            log "  $MOUNT top 5 directories:" 
            du -sh $MOUNT/* 2>/dev/null | sort -rh | head -5 | awk '{print "    " $0}' | tee -a "$REPORT_FILE"
        fi
    done < <(df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop')
    if [ $HAS_DISK_WARNING -eq 0 ]; then
        log_ok "All disk partitions within limits"
    fi
    log "Inode usage:"
    df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop' | awk '{print "  " $0}' | tee -a "$REPORT_FILE"
    while read line; do
        INODE_USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
        MOUNT=$(echo "$line" | awk '{print $6}')
        if [ "$INODE_USAGE" -gt $INODE_WARNING ]; then
            log_warning "Partition $MOUNT inode usage $INODE_USAGE% exceeds $INODE_WARNING%"
        fi
    done < <(df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop')
}

# 5. Network status check
check_network() {
    log_section "5. Network status check"
    log "Network interfaces:"
    ip -br addr | awk '{print "  " $0}' | tee -a "$REPORT_FILE"
    log "TCP connection states:"
    netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c | sort -rn | tee -a "$REPORT_FILE"
    TIME_WAIT_COUNT=$(netstat -an | grep TIME_WAIT | wc -l)
    log "TIME_WAIT connections: $TIME_WAIT_COUNT"
    if [ $TIME_WAIT_COUNT -gt 5000 ]; then
        log_warning "Too many TIME_WAIT connections"
        log "Optimization suggestions: echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"
    else
        log_ok "TIME_WAIT connections normal"
    fi
    log "Listening ports:"
    netstat -tuln | grep LISTEN | tee -a "$REPORT_FILE"
}

# 6. Generate summary
generate_summary() {
    log_section "6. Inspection summary"
    WARNING_COUNT=$(grep -c "\[WARNING\]" "$REPORT_FILE")
    ERROR_COUNT=$(grep -c "\[ERROR\]" "$REPORT_FILE")
    log "Inspection completed at $(date +"%Y-%m-%d %H:%M:%S")"
    log "Warnings: $WARNING_COUNT"
    log "Errors: $ERROR_COUNT"
    if [ $ERROR_COUNT -gt 0 ]; then
        log_error "Found $ERROR_COUNT critical issues, please address immediately!"
    elif [ $WARNING_COUNT -gt 0 ]; then
        log_warning "Found $WARNING_COUNT warnings, please review"
    else
        log_ok "System status good, no anomalies"
    fi
    log "Full report saved to: $REPORT_FILE"
}

# Main function
main() {
    echo "=========================================="
    echo "    Server health inspection script v2.0"
    echo "=========================================="
    echo ""
    if [ "$(id -u)" -ne 0 ]; then
        echo "Warning: not running as root, some checks may be limited"
    fi
    check_basic_info
    check_cpu
    check_memory
    check_disk
    check_network
    generate_summary
    echo ""
    echo "=========================================="
    echo "    Inspection finished!"
    echo "=========================================="
}

main "$@"

Script 2: Disk space deep check

This script performs a detailed analysis of disk usage, identifies the largest directories, finds large files, predicts when a disk will fill up, and offers cleanup suggestions.

#!/bin/bash
################################################################################
# Script name: disk_space_check.sh
# Description: Deep disk space analysis and reporting
# Version: v1.5
################################################################################

REPORT_FILE="/var/log/disk_check_$(date +%Y%m%d_%H%M%S).log"
DISK_WARNING=80
INODE_WARNING=80
LARGE_FILE_SIZE=1G  # Large file threshold

# Color definitions
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }

log "============================================================"
log "1. Disk usage check"
log "============================================================"

# Check each partition
while read line; do
    USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
    MOUNT=$(echo "$line" | awk '{print $6}')
    AVAIL=$(echo "$line" | awk '{print $4}')
    log "Partition: $MOUNT"
    log "  Usage: ${USAGE}%"
    log "  Available: $AVAIL"
    if [ "$USAGE" -gt $DISK_WARNING ]; then
        log_error "Disk usage exceeds $DISK_WARNING% on $MOUNT"
        log "  Top 10 directories in $MOUNT:"
        du -sh $MOUNT/* 2>/dev/null | sort -rh | head -10 | awk '{print "    " $0}' | tee -a "$REPORT_FILE"
        log "  Large files (> $LARGE_FILE_SIZE) in $MOUNT:"
        find $MOUNT -type f -size +${LARGE_FILE_SIZE} -exec ls -lh {} \; 2>/dev/null | awk '{print "    " $9 " (" $5 ")"}' | head -10 | tee -a "$REPORT_FILE"
    else
        log_ok "Disk usage normal on $MOUNT"
    fi
done < <(df -h | grep -vE '^Filesystem|tmpfs|cdrom|loop')

# Inode usage check
log ""
log "Inode usage check:"
while read line; do
    INODE_USAGE=$(echo "$line" | awk '{print $5}' | sed 's/%//')
    MOUNT=$(echo "$line" | awk '{print $6}')
    log "Partition: $MOUNT"
    log "  Inode usage: ${INODE_USAGE}%"
    if [ "$INODE_USAGE" -gt $INODE_WARNING ]; then
        log_error "Inode usage exceeds $INODE_WARNING% on $MOUNT"
    fi
done < <(df -i | grep -vE '^Filesystem|tmpfs|cdrom|loop')

log ""
log "Cleanup suggestions:"
log "1. Compress or delete old log files: find /var/log -type f -name '*.log' -mtime +30 -exec gzip {} \;"
log "2. Remove temporary files older than 7 days: find /tmp -type f -atime +7 -delete"
log "3. Clean yum cache: yum clean all"
log "4. Clean apt cache: apt-get clean"
log "5. Prune unused Docker images/containers: docker system prune -a"
log "6. Vacuum journal logs: journalctl --vacuum-time=7d"

Script 3: Network connection and port check

Focuses on network interface status, TCP connection statistics, listening ports, DNS and gateway reachability, and kernel network parameters.

#!/bin/bash
################################################################################
# Script name: network_check.sh
# Description: Deep network connection and port inspection
# Version: v1.0
################################################################################

REPORT_FILE="/var/log/network_check_$(date +%Y%m%d_%H%M%S).log"
TIME_WAIT_WARNING=5000
ESTABLISHED_WARNING=10000
CRITICAL_PORTS=(22 80 443 3306 6379 8080)

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }

# 1. Interface status
check_network_interfaces() {
    log "============================================================"
    log "1. Network interface status"
    log "============================================================"
    log "Interface list:"
    ip -br addr | tee -a "$REPORT_FILE"
    for interface in $(ip -br link | awk '{print $1}' | grep -v lo); do
        log "Interface: $interface"
        STATUS=$(ip link show $interface | grep -i "state" | awk '{print $9}')
        log "  Status: $STATUS"
        RX_BYTES=$(cat /sys/class/net/$interface/statistics/rx_bytes 2>/dev/null)
        TX_BYTES=$(cat /sys/class/net/$interface/statistics/tx_bytes 2>/dev/null)
        log "  RX: $(numfmt --to=iec $RX_BYTES 2>/dev/null || echo "${RX_BYTES}B")"
        log "  TX: $(numfmt --to=iec $TX_BYTES 2>/dev/null || echo "${TX_BYTES}B")"
        RX_DROP=$(cat /sys/class/net/$interface/statistics/rx_dropped 2>/dev/null)
        TX_DROP=$(cat /sys/class/net/$interface/statistics/tx_dropped 2>/dev/null)
        if [ "$RX_DROP" -gt 0 ] || [ "$TX_DROP" -gt 0 ]; then
            log_warning "Packet drops on $interface: RX=$RX_DROP TX=$TX_DROP"
        fi
        RX_ERR=$(cat /sys/class/net/$interface/statistics/rx_errors 2>/dev/null)
        TX_ERR=$(cat /sys/class/net/$interface/statistics/tx_errors 2>/dev/null)
        if [ "$RX_ERR" -gt 0 ] || [ "$TX_ERR" -gt 0 ]; then
            log_error "Errors on $interface: RX=$RX_ERR TX=$TX_ERR"
        fi
    done
}

# 2. TCP connection statistics
check_tcp_connections() {
    log "============================================================"
    log "2. TCP connection statistics"
    log "============================================================"
    log "Connection state distribution:"
    netstat -an | awk '/^tcp/ {print $6}' | sort | uniq -c | sort -rn | tee -a "$REPORT_FILE"
    TIME_WAIT_COUNT=$(netstat -an | grep TIME_WAIT | wc -l)
    log "TIME_WAIT connections: $TIME_WAIT_COUNT"
    if [ $TIME_WAIT_COUNT -gt $TIME_WAIT_WARNING ]; then
        log_warning "Too many TIME_WAIT connections, consider tuning TCP parameters"
        log "Optimization suggestions:"
        log "  echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse"
        log "  echo 1 > /proc/sys/net/ipv4/tcp_tw_recycle  # not recommended in NAT environments"
    else
        log_ok "TIME_WAIT connections normal"
    fi
    ESTABLISHED_COUNT=$(netstat -an | grep ESTABLISHED | wc -l)
    log "ESTABLISHED connections: $ESTABLISHED_COUNT"
    if [ $ESTABLISHED_COUNT -gt $ESTABLISHED_WARNING ]; then
        log_warning "High number of ESTABLISHED connections"
    fi
    log "Top 10 remote IPs by ESTABLISHED connections:"
    netstat -an | grep ESTABLISHED | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -rn | head -10 | tee -a "$REPORT_FILE"
}

# 3. Listening ports check
check_listening_ports() {
    log "============================================================"
    log "3. Listening ports check"
    log "============================================================"
    log "Current listening ports:"
    netstat -tuln | grep LISTEN | tee -a "$REPORT_FILE"
    log "Critical port status check:"
    for port in "${CRITICAL_PORTS[@]}"; do
        if netstat -tuln | grep -q ":$port "; then
            PID=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f1)
            PROC=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f2)
            log_ok "Port $port listening (process: $PROC PID: $PID)"
        else
            log_warning "Port $port not listening"
        fi
    done
}

# 4. Network connectivity check
check_network_connectivity() {
    log "============================================================"
    log "4. Network connectivity check"
    log "============================================================"
    log "DNS resolution test:"
    if nslookup www.baidu.com >/dev/null 2>&1; then
        log_ok "DNS resolution normal"
    else
        log_error "DNS resolution failed"
    fi
    GATEWAY=$(ip route | grep default | awk '{print $3}' | head -1)
    if [ -n "$GATEWAY" ]; then
        log "Default gateway: $GATEWAY"
        if ping -c 3 -W 2 $GATEWAY >/dev/null 2>&1; then
            log_ok "Gateway reachable"
        else
            log_error "Gateway unreachable"
        fi
    else
        log_warning "No default gateway found"
    fi
    if ping -c 3 -W 2 8.8.8.8 >/dev/null 2>&1; then
        log_ok "External network reachable"
    else
        log_error "External network unreachable"
    fi
}

# 5. Network kernel parameters
check_network_parameters() {
    log "============================================================"
    log "5. Network kernel parameters"
    log "============================================================"
    log "tcp_tw_reuse: $(cat /proc/sys/net/ipv4/tcp_tw_reuse)"
    log "tcp_tw_recycle: $(cat /proc/sys/net/ipv4/tcp_tw_recycle 2>/dev/null || echo 'N/A')"
    log "tcp_fin_timeout: $(cat /proc/sys/net/ipv4/tcp_fin_timeout)"
    log "tcp_keepalive_time: $(cat /proc/sys/net/ipv4/tcp_keepalive_time)"
    log "tcp_max_syn_backlog: $(cat /proc/sys/net/ipv4/tcp_max_syn_backlog)"
    log "somaxconn: $(cat /proc/sys/net/core/somaxconn)"
    log "File descriptor limits: soft=$(ulimit -Sn) hard=$(ulimit -Hn)"
}

# Main function
main() {
    echo "=========================================="
    echo "    Network status inspection script v1.0"
    echo "=========================================="
    echo ""
    check_network_interfaces
    check_tcp_connections
    check_listening_ports
    check_network_connectivity
    check_network_parameters
    echo ""
    log "============================================================"
    log "Inspection completed!"
    log "Report saved to: $REPORT_FILE"
    log "============================================================"
}

main "$@"

Script 4: Application process health check

Validates the running state, resource consumption, and interface responsiveness of critical business processes.

#!/bin/bash
################################################################################
# Script name: process_health_check.sh
# Description: Application process health inspection
# Version: v2.0
################################################################################

REPORT_FILE="/var/log/process_check_$(date +%Y%m%d_%H%M%S).log"

# Define critical processes (adjust per business)
declare -A CRITICAL_PROCESSES=(
    ["nginx"]="nginx|80,443"
    ["mysql"]="mysqld|3306"
    ["redis"]="redis-server|6379"
    ["java"]="java|8080"
)

# Define HTTP endpoints to test
declare -A HTTP_ENDPOINTS=(
    ["Home"]="http://localhost/"
    ["Health"]="http://localhost:8080/health"
    ["API Status"]="http://localhost:8080/api/status"
)

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }

# 1. Process check
check_process() {
    local name="$1"
    local pattern="$2"
    log "Checking process: $name"
    PIDS=$(pgrep -f "$pattern")
    if [ -z "$PIDS" ]; then
        log_error "Process not running!"
        return 1
    fi
    PID_COUNT=$(echo "$PIDS" | wc -l)
    log_ok "Process running (instances: $PID_COUNT)"
    for pid in $PIDS; do
        STATUS=$(ps -p $pid -o stat --no-headers)
        if [[ $STATUS == *Z* ]]; then
            log_error "PID $pid: zombie state"
            continue
        elif [[ $STATUS == *T* ]]; then
            log_warning "PID $pid: stopped state"
            continue
        fi
        CPU=$(ps -p $pid -o %cpu --no-headers | xargs)
        MEM=$(ps -p $pid -o %mem --no-headers | xargs)
        RSS=$(ps -p $pid -o rss --no-headers | xargs)
        VSZ=$(ps -p $pid -o vsz --no-headers | xargs)
        UPTIME=$(ps -p $pid -o etime --no-headers | xargs)
        THREADS=$(ps -p $pid -o nlwp --no-headers | xargs)
        log "PID $pid:"
        log "  CPU: ${CPU}%"
        log "  Memory: ${MEM}% (RSS:${RSS}KB VSZ:${VSZ}KB)"
        log "  Threads: $THREADS"
        log "  Uptime: $UPTIME"
        if (( $(echo "$CPU > 80" | bc -l) )); then
            log_warning "CPU usage high!"
        fi
        if (( $(echo "$MEM > 80" | bc -l) )); then
            log_warning "Memory usage high!"
        fi
        if [ $THREADS -gt 1000 ]; then
            log_warning "Thread count high!"
        fi
        if [ -d "/proc/$pid/fd" ]; then
            FD_COUNT=$(ls /proc/$pid/fd 2>/dev/null | wc -l)
            log "  File descriptors: $FD_COUNT"
            if [ $FD_COUNT -gt 10000 ]; then
                log_warning "Too many file descriptors!"
            fi
        fi
        if [ -d "/proc/$pid" ]; then
            CORE_PATTERN=$(cat /proc/sys/kernel/core_pattern)
            log "  Core dump config: $CORE_PATTERN"
        fi
    done
    return 0
}

# 2. Port check for a service
check_ports() {
    local name="$1"
    local ports="$2"
    log "Checking ports for: $name"
    IFS=',' read -ra PORT_ARRAY <<< "$ports"
    for port in "${PORT_ARRAY[@]}"; do
        if netstat -tuln | grep -q ":$port "; then
            PID=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f1)
            PROC=$(netstat -tulnp 2>/dev/null | grep ":$port " | awk '{print $7}' | cut -d'/' -f2)
            log_ok "Port $port listening (process: $PROC PID: $PID)"
            ESTABLISHED=$(netstat -an | grep ":$port " | grep ESTABLISHED | wc -l)
            TIME_WAIT=$(netstat -an | grep ":$port " | grep TIME_WAIT | wc -l)
            CLOSE_WAIT=$(netstat -an | grep ":$port " | grep CLOSE_WAIT | wc -l)
            log "  ESTABLISHED: $ESTABLISHED"
            log "  TIME_WAIT: $TIME_WAIT"
            log "  CLOSE_WAIT: $CLOSE_WAIT"
            if [ $CLOSE_WAIT -gt 100 ]; then
                log_warning "CLOSE_WAIT connections high, possible leak!"
            fi
        else
            log_error "Port $port not listening"
        fi
    done
}

# 3. HTTP endpoint health check
check_http_endpoints() {
    log ""
    log "============================================================"
    log "HTTP endpoint health check"
    log "============================================================"
    for name in "${!HTTP_ENDPOINTS[@]}"; do
        url="${HTTP_ENDPOINTS[$name]}"
        log "Checking: $name ($url)"
        START_TIME=$(date +%s%N)
        RESPONSE=$(curl -s -o /dev/null -w "%{http_code}" --connect-timeout 5 --max-time 10 "$url" 2>&1)
        END_TIME=$(date +%s%N)
        if [ $? -eq 0 ]; then
            HTTP_CODE=$RESPONSE
            RESPONSE_TIME=$(echo "scale=3; ($END_TIME - $START_TIME) / 1000000000" | bc)
            if [ "$HTTP_CODE" == "200" ]; then
                log_ok "HTTP $HTTP_CODE response time: ${RESPONSE_TIME}s"
            elif [ "$HTTP_CODE" == "301" ] || [ "$HTTP_CODE" == "302" ]; then
                log_warning "HTTP $HTTP_CODE (redirect)"
            else
                log_error "HTTP $HTTP_CODE (abnormal)"
            fi
            if (( $(echo "$RESPONSE_TIME > 3" | bc -l) )); then
                log_warning "Response time too long: ${RESPONSE_TIME}s"
            fi
        else
            log_error "Request failed – timeout or network error"
        fi
    done
}

# 4. Process dependency check (shared libraries)
check_process_dependencies() {
    log ""
    log "============================================================"
    log "Process dependency check"
    log "============================================================"
    log "Shared library dependencies:"
    for proc in "${!CRITICAL_PROCESSES[@]}"; do
        pattern=$(echo "${CRITICAL_PROCESSES[$proc]}" | cut -d'|' -f1)
        PID=$(pgrep -f "$pattern" | head -1)
        if [ -n "$PID" ]; then
            EXE=$(readlink /proc/$PID/exe 2>/dev/null)
            if [ -n "$EXE" ]; then
                log " $proc ($EXE):"
                MISSING=$(ldd "$EXE" 2>/dev/null | grep "not found")
                if [ -n "$MISSING" ]; then
                    log_error "  Missing dependencies:"
                    echo "$MISSING" | awk '{print "      " $0}' | tee -a "$REPORT_FILE"
                else
                    log_ok "  Dependencies complete"
                fi
            fi
        fi
    done
}

# Main function
main() {
    echo "=========================================="
    echo "    Application process health check v2.0"
    echo "=========================================="
    echo ""
    log "============================================================"
    log "Process and port check"
    log "============================================================"
    for proc in "${!CRITICAL_PROCESSES[@]}"; do
        IFS='|' read -r pattern ports <<< "${CRITICAL_PROCESSES[$proc]}"
        check_process "$proc" "$pattern"
        check_ports "$proc" "$ports"
        log ""
    done
    check_http_endpoints
    check_process_dependencies
    log ""
    log "============================================================"
    log "Inspection completed!"
    log "Report saved to: $REPORT_FILE"
    log "============================================================"
}

main "$@"

Script 5: Security baseline check

Inspects account security, SSH configuration, critical file permissions, SUID binaries, firewall status, SELinux state, system updates, and sensitive command history.

#!/bin/bash
################################################################################
# Script name: security_baseline_check.sh
# Description: Server security baseline inspection
# Version: v1.0
################################################################################

REPORT_FILE="/var/log/security_check_$(date +%Y%m%d_%H%M%S).log"

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }

# 1. Account security check
check_account_security() {
    log "============================================================"
    log "1. Account security check"
    log "============================================================"
    log "Empty password accounts:"
    EMPTY_PASS=$(awk -F: '($2 == "") {print $1}' /etc/shadow 2>/dev/null)
    if [ -n "$EMPTY_PASS" ]; then
        log_error "Found empty‑password accounts:"
        echo "$EMPTY_PASS" | awk '{print "    "$0}' | tee -a "$REPORT_FILE"
    else
        log_ok "No empty‑password accounts"
    fi
    log "UID 0 accounts (root privileges):"
    ROOT_USERS=$(awk -F: '($3 == 0) {print $1}' /etc/passwd)
    echo "$ROOT_USERS" | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
    if [ $(echo "$ROOT_USERS" | wc -l) -gt 1 ]; then
        log_warning "Multiple UID 0 accounts detected!"
    fi
    log "Login‑capable system accounts (UID<1000, shell not nologin):"
    awk -F: '($3 < 1000 && $7 !~ /nologin|false/) {print "  "$1" (UID:"$3" Shell:"$7")"}' /etc/passwd | tee -a "$REPORT_FILE"
    log "Users with sudo privileges:"
    grep -E '^%wheel|^%sudo|^[^#].*ALL=.*ALL' /etc/sudoers /etc/sudoers.d/* 2>/dev/null | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
    log "Recent login failures (last 10):"
    lastb | head -10 | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
}

# 2. SSH security configuration check
check_ssh_security() {
    log ""
    log "============================================================"
    log "2. SSH security configuration check"
    log "============================================================"
    SSHD_CONFIG="/etc/ssh/sshd_config"
    if [ ! -f "$SSHD_CONFIG" ]; then
        log_error "SSH config file missing"
        return 1
    fi
    ROOT_LOGIN=$(grep "^PermitRootLogin" $SSHD_CONFIG | awk '{print $2}')
    log "PermitRootLogin: $ROOT_LOGIN"
    if [ "$ROOT_LOGIN" == "yes" ]; then
        log_warning "SSH permits root login (insecure)"
    else
        log_ok "SSH root login disabled"
    fi
    PASS_AUTH=$(grep "^PasswordAuthentication" $SSHD_CONFIG | awk '{print $2}')
    log "PasswordAuthentication: $PASS_AUTH"
    if [ "$PASS_AUTH" == "yes" ]; then
        log_warning "Password authentication enabled (use keys)"
    else
        log_ok "Password authentication disabled"
    fi
    EMPTY_PASS=$(grep "^PermitEmptyPasswords" $SSHD_CONFIG | awk '{print $2}')
    log "PermitEmptyPasswords: ${EMPTY_PASS:-no}"
    if [ "$EMPTY_PASS" == "yes" ]; then
        log_error "Empty passwords allowed (high risk)"
    else
        log_ok "Empty passwords disabled"
    fi
    SSH_PORT=$(grep "^Port" $SSHD_CONFIG | awk '{print $2}')
    log "SSH port: ${SSH_PORT:-22}"
    if [ "$SSH_PORT" == "22" ] || [ -z "$SSH_PORT" ]; then
        log_warning "Default SSH port 22 in use (consider changing)"
    else
        log_ok "Custom SSH port configured"
    fi
    MAX_AUTH=$(grep "^MaxAuthTries" $SSHD_CONFIG | awk '{print $2}')
    log "MaxAuthTries: ${MAX_AUTH:-6}"
    ALLOW_USERS=$(grep "^AllowUsers" $SSHD_CONFIG | cut -d' ' -f2-)
    if [ -n "$ALLOW_USERS" ]; then
        log "AllowUsers: $ALLOW_USERS"
    else
        log_warning "AllowUsers not set (all users can attempt login)"
    fi
}

# 3. Critical file permission check
check_file_permissions() {
    log ""
    log "============================================================"
    log "3. Critical file permission check"
    log "============================================================"
    declare -A CRITICAL_FILES=(
        ["/etc/passwd"]="644"
        ["/etc/shadow"]="000"
        ["/etc/group"]="644"
        ["/etc/gshadow"]="000"
        ["/etc/ssh/sshd_config"]="600"
    )
    for file in "${!CRITICAL_FILES[@]}"; do
        if [ -f "$file" ]; then
            PERM=$(stat -c "%a" "$file")
            EXPECTED="${CRITICAL_FILES[$file]}"
            log "$file: $PERM"
            if [ "$PERM" != "$EXPECTED" ]; then
                log_warning "Incorrect permission (expected: $EXPECTED)"
            else
                log_ok "Permission correct"
            fi
        else
            log_warning "$file does not exist"
        fi
    done
    log "SUID files (potential privilege escalation):"
    find / -type f -perm -4000 2>/dev/null | head -20 | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
}

# 4. Firewall check
check_firewall() {
    log ""
    log "============================================================"
    log "4. Firewall status check"
    log "============================================================"
    if command -v iptables >/dev/null 2>&1; then
        RULE_COUNT=$(iptables -L -n | wc -l)
        log "iptables rule count: $RULE_COUNT"
        if [ $RULE_COUNT -lt 10 ]; then
            log_warning "Few iptables rules, firewall may be misconfigured"
        fi
    fi
    if systemctl is-active --quiet firewalld 2>/dev/null; then
        log_ok "firewalld: running"
    else
        log_warning "firewalld: not running"
    fi
    if command -v getenforce >/dev/null 2>&1; then
        SELINUX=$(getenforce)
        log "SELinux: $SELINUX"
        if [ "$SELINUX" == "Disabled" ]; then
            log_warning "SELinux disabled (reduced security)"
        fi
    fi
}

# 5. System update check
check_system_updates() {
    log ""
    log "============================================================"
    log "5. System update check"
    log "============================================================"
    if command -v yum >/dev/null 2>&1; then
        UPDATE_COUNT=$(yum check-update 2>/dev/null | grep -v "^$" | wc -l)
        log "Available updates (yum): $UPDATE_COUNT"
        if [ $UPDATE_COUNT -gt 50 ]; then
            log_warning "Many pending updates, schedule regular patching"
        fi
    fi
    if command -v apt-get >/dev/null 2>&1; then
        apt-get update >/dev/null 2>&1
        UPDATE_COUNT=$(apt list --upgradable 2>/dev/null | wc -l)
        log "Available updates (apt): $UPDATE_COUNT"
    fi
    KERNEL=$(uname -r)
    log "Current kernel version: $KERNEL"
}

# 6. Sensitive command history check
check_history() {
    log ""
    log "============================================================"
    log "6. Sensitive command history check"
    log "============================================================"
    if [ -f ~/.bash_history ]; then
        log "Dangerous commands found:"
        grep -E "rm -rf|mkfs|dd|:/dev/|wget.*sh|curl.*sh|chmod 777" ~/.bash_history 2>/dev/null | tail -10 | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
    fi
    log "Possible password leaks in history:"
    grep -E "password=|passwd|mysql.*-p" ~/.bash_history 2>/dev/null | tail -5 | awk '{print "  "$0}' | tee -a "$REPORT_FILE"
}

# Main function
main() {
    echo "=========================================="
    echo "    Security baseline check script v1.0"
    echo "=========================================="
    echo ""
    if [ "$(id -u)" -ne 0 ]; then
        log_warning "Non‑root execution, some checks may be limited"
    fi
    check_account_security
    check_ssh_security
    check_file_permissions
    check_firewall
    check_system_updates
    check_history
    log ""
    log "============================================================"
    log "Inspection completed!"
    log "Report saved to: $REPORT_FILE"
    log "============================================================"
}

main "$@"

Script 6: MySQL health check

Evaluates MySQL connection, connection usage, query performance, InnoDB status, replication health, and table space consumption.

#!/bin/bash
################################################################################
# Script name: mysql_health_check.sh
# Description: MySQL database health inspection
# Version: v1.0
################################################################################

REPORT_FILE="/var/log/mysql_check_$(date +%Y%m%d_%H%M%S).log"

# MySQL connection configuration (adjust as needed)
MYSQL_USER="monitor"
MYSQL_PASS="your_password"
MYSQL_HOST="localhost"
MYSQL_PORT="3306"

# Alert thresholds
CONN_WARNING=100
SLOW_QUERY_WARNING=100
THREAD_RUNNING_WARNING=10

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log() { echo "[$(date +"%Y-%m-%d %H:%M:%S")] $1" | tee -a "$REPORT_FILE"; }
log_warning() { echo -e "${YELLOW}[WARNING] $1${NC}" | tee -a "$REPORT_FILE"; }
log_error() { echo -e "${RED}[ERROR] $1${NC}" | tee -a "$REPORT_FILE"; }
log_ok() { echo -e "${GREEN}[OK] $1${NC}" | tee -a "$REPORT_FILE"; }

mysql_exec() { mysql -u"$MYSQL_USER" -p"$MYSQL_PASS" -h"$MYSQL_HOST" -P"$MYSQL_PORT" -e "$1" 2>/dev/null; }

# 1. Connection check
check_mysql_connection() {
    log "============================================================"
    log "1. MySQL connection check"
    log "============================================================"
    if mysql_exec "SELECT 1" >/dev/null 2>&1; then
        log_ok "MySQL connection OK"
        VERSION=$(mysql_exec "SELECT VERSION()" | tail -1)
        log "MySQL version: $VERSION"
        UPTIME=$(mysql_exec "SHOW STATUS LIKE 'Uptime'" | tail -1 | awk '{print $2}')
        UPTIME_DAYS=$(echo "scale=2; $UPTIME / 86400" | bc)
        log "Uptime: ${UPTIME_DAYS} days"
        return 0
    else
        log_error "MySQL connection failed"
        return 1
    fi
}

# 2. Connection usage check
check_mysql_connections() {
    log ""
    log "============================================================"
    log "2. Connection usage check"
    log "============================================================"
    THREADS_CONNECTED=$(mysql_exec "SHOW STATUS LIKE 'Threads_connected'" | tail -1 | awk '{print $2}')
    log "Current connections: $THREADS_CONNECTED"
    MAX_CONNECTIONS=$(mysql_exec "SHOW VARIABLES LIKE 'max_connections'" | tail -1 | awk '{print $2}')
    log "Max connections: $MAX_CONNECTIONS"
    CONN_USAGE=$(echo "scale=2; $THREADS_CONNECTED / $MAX_CONNECTIONS * 100" | bc)
    log "Connection usage: ${CONN_USAGE}%"
    if (( $(echo "$CONN_USAGE > 80" | bc -l) )); then
        log_warning "Connection usage high!"
    else
        log_ok "Connection usage normal"
    fi
    MAX_USED=$(mysql_exec "SHOW STATUS LIKE 'Max_used_connections'" | tail -1 | awk '{print $2}')
    log "Historical max connections: $MAX_USED"
    THREADS_RUNNING=$(mysql_exec "SHOW STATUS LIKE 'Threads_running'" | tail -1 | awk '{print $2}')
    log "Running threads: $THREADS_RUNNING"
    if [ "$THREADS_RUNNING" -gt $THREAD_RUNNING_WARNING ]; then
        log_warning "Many running threads, possible slow queries"
    fi
}

# 3. Query performance check
check_query_performance() {
    log ""
    log "============================================================"
    log "3. Query performance check"
    log "============================================================"
    SLOW_QUERIES=$(mysql_exec "SHOW STATUS LIKE 'Slow_queries'" | tail -1 | awk '{print $2}')
    log "Slow queries: $SLOW_QUERIES"
    if [ "$SLOW_QUERIES" -gt $SLOW_QUERY_WARNING ]; then
        log_warning "Many slow queries!"
        SLOW_QUERY_TIME=$(mysql_exec "SHOW VARIABLES LIKE 'long_query_time'" | tail -1 | awk '{print $2}')
        log "Slow query threshold: ${SLOW_QUERY_TIME}s"
    fi
    QUESTIONS=$(mysql_exec "SHOW STATUS LIKE 'Questions'" | tail -1 | awk '{print $2}')
    UPTIME=$(mysql_exec "SHOW STATUS LIKE 'Uptime'" | tail -1 | awk '{print $2}')
    QPS=$(echo "scale=2; $QUESTIONS / $UPTIME" | bc)
    log "Average QPS: $QPS"
    COM_COMMIT=$(mysql_exec "SHOW STATUS LIKE 'Com_commit'" | tail -1 | awk '{print $2}')
    COM_ROLLBACK=$(mysql_exec "SHOW STATUS LIKE 'Com_rollback'" | tail -1 | awk '{print $2}')
    TPS=$(echo "scale=2; ($COM_COMMIT + $COM_ROLLBACK) / $UPTIME" | bc)
    log "Average TPS: $TPS"
}

# 4. InnoDB status check
check_innodb_status() {
    log ""
    log "============================================================"
    log "4. InnoDB storage engine check"
    log "============================================================"
    INNODB_BUFFER_POOL=$(mysql_exec "SHOW VARIABLES LIKE 'innodb_buffer_pool_size'" | tail -1 | awk '{print $2}')
    INNODB_BUFFER_GB=$(echo "scale=2; $INNODB_BUFFER_POOL / 1024 / 1024 / 1024" | bc)
    log "InnoDB buffer pool size: ${INNODB_BUFFER_GB}GB"
    BUFFER_READS=$(mysql_exec "SHOW STATUS LIKE 'Innodb_buffer_pool_reads'" | tail -1 | awk '{print $2}')
    BUFFER_READ_REQUESTS=$(mysql_exec "SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests'" | tail -1 | awk '{print $2}')
    if [ "$BUFFER_READ_REQUESTS" -gt 0 ]; then
        HIT_RATIO=$(echo "scale=4; (1 - $BUFFER_READS / $BUFFER_READ_REQUESTS) * 100" | bc)
        log "Buffer pool hit ratio: ${HIT_RATIO}%"
        if (( $(echo "$HIT_RATIO < 99" | bc -l) )); then
            log_warning "Low buffer pool hit ratio, consider increasing innodb_buffer_pool_size"
        else
            log_ok "Buffer pool hit ratio good"
        fi
    fi
    INNODB_LOG_SIZE=$(mysql_exec "SHOW VARIABLES LIKE 'innodb_log_file_size'" | tail -1 | awk '{print $2}')
    INNODB_LOG_MB=$(echo "scale=2; $INNODB_LOG_SIZE / 1024 / 1024" | bc)
    log "InnoDB log file size: ${INNODB_LOG_MB}MB"
}

# 5. Replication status check
check_replication_status() {
    log ""
    log "============================================================"
    log "5. Master‑slave replication status check"
    log "============================================================"
    SLAVE_STATUS=$(mysql_exec "SHOW SLAVE STATUS\G" 2>/dev/null)
    if [ -n "$SLAVE_STATUS" ]; then
        log "Slave replication status:"
        SLAVE_IO_RUNNING=$(echo "$SLAVE_STATUS" | grep "Slave_IO_Running" | awk '{print $2}')
        log "  Slave_IO_Running: $SLAVE_IO_RUNNING"
        SLAVE_SQL_RUNNING=$(echo "$SLAVE_STATUS" | grep "Slave_SQL_Running" | awk '{print $2}' | head -1)
        log "  Slave_SQL_Running: $SLAVE_SQL_RUNNING"
        if [ "$SLAVE_IO_RUNNING" != "Yes" ] || [ "$SLAVE_SQL_Running" != "Yes" ]; then
            log_error "Replication abnormal!"
            LAST_IO_ERROR=$(echo "$SLAVE_STATUS" | grep "Last_IO_Error:" | cut -d':' -f2-)
            LAST_SQL_ERROR=$(echo "$SLAVE_STATUS" | grep "Last_SQL_Error:" | cut -d':' -f2-)
            [ -n "$LAST_IO_ERROR" ] && log "  IO error: $LAST_IO_ERROR"
            [ -n "$LAST_SQL_ERROR" ] && log "  SQL error: $LAST_SQL_ERROR"
        else
            log_ok "Replication normal"
        fi
        SECONDS_BEHIND=$(echo "$SLAVE_STATUS" | grep "Seconds_Behind_Master" | awk '{print $2}')
        log "  Replication lag: ${SECONDS_BEHIND}s"
        if [ "$SECONDS_BEHIND" != "NULL" ] && [ "$SECONDS_BEHIND" -gt 60 ]; then
            log_warning "Replication lag large!"
        fi
    else
        log "Not a slave or replication not configured"
    fi
}

# 6. Tablespace check
check_tablespace() {
    log ""
    log "============================================================"
    log "6. Tablespace check"
    log "============================================================"
    log "Top 10 tables by size:"
    mysql_exec "SELECT table_schema AS 'Database', table_name AS 'Table', ROUND((data_length + index_length) / 1024 / 1024, 2) AS 'Size(MB)' FROM information_schema.tables WHERE table_schema NOT IN ('mysql','information_schema','performance_schema','sys') ORDER BY (data_length + index_length) DESC LIMIT 10;" | tee -a "$REPORT_FILE"
}

# Main function
main() {
    echo "=========================================="
    echo "    MySQL health check script v1.0"
    echo "=========================================="
    echo ""
    if ! check_mysql_connection; then
        log_error "Cannot connect to MySQL, aborting"
        exit 1
    fi
    check_mysql_connections
    check_query_performance
    check_innodb_status
    check_replication_status
    check_tablespace
    log ""
    log "============================================================"
    log "Inspection completed!"
    log "Report saved to: $REPORT_FILE"
    log "============================================================"
}

main "$@"

Script 7: Batch server inspection scheduler

Runs the previous scripts concurrently on multiple servers, aggregates results, and generates an HTML summary report.

#!/bin/bash
################################################################################
# Script name: batch_check_scheduler.sh
# Description: Batch server inspection scheduler
# Version: v2.0
################################################################################

# Configuration area
SERVER_LIST="servers.txt"   # Server list file
SSH_USER="root"            # SSH user
SSH_KEY="~/.ssh/id_rsa"    # SSH key
CONCURRENT=10              # Number of concurrent jobs
CHECK_SCRIPT="system_health_check.sh"  # Local inspection script
RESULT_DIR="/var/log/batch_check_$(date +%Y%m%d_%H%M%S)"

mkdir -p "$RESULT_DIR"

GREEN='\033[0;32m'
RED='\033[0;31m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() { echo -e "[$(date +"%H:%M:%S")] ${GREEN}[INFO]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }
log_error() { echo -e "[$(date +"%H:%M:%S")] ${RED}[ERROR]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }
log_warning() { echo -e "[$(date +"%H:%M:%S")] ${YELLOW}[WARNING]${NC}$1" | tee -a "$RESULT_DIR/scheduler.log"; }

# Single server inspection
check_single_server() {
    local server=$1
    local result_file="$RESULT_DIR/${server}_result.log"
    local status_file="$RESULT_DIR/${server}_status.txt"
    log_info "Starting check: $server"
    # SSH connectivity test
    if ! ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "echo ok" >/dev/null 2>&1; then
        log_error "$server - SSH connection failed"
        echo "FAILED|SSH connection failed" > "$status_file"
        return 1
    fi
    # Upload inspection script
    scp -o StrictHostKeyChecking=no -i "$SSH_KEY" "$CHECK_SCRIPT" "$SSH_USER@$server:/tmp/" >/dev/null 2>&1
    if [ $? -ne 0 ]; then
        log_error "$server - Script upload failed"
        echo "FAILED|Script upload failed" > "$status_file"
        return 1
    fi
    # Execute inspection
    ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "bash /tmp/$CHECK_SCRIPT" > "$result_file" 2>&1
    if [ $? -eq 0 ]; then
        WARNING_COUNT=$(grep -c "\[WARNING\]" "$result_file" || echo 0)
        ERROR_COUNT=$(grep -c "\[ERROR\]" "$result_file" || echo 0)
        if [ $ERROR_COUNT -gt 0 ]; then
            echo "ERROR|Warnings:$WARNING_COUNT Errors:$ERROR_COUNT" > "$status_file"
            log_error "$server - Detected $ERROR_COUNT errors"
        elif [ $WARNING_COUNT -gt 0 ]; then
            echo "WARNING|Warnings:$WARNING_COUNT Errors:$ERROR_COUNT" > "$status_file"
            log_warning "$server - Detected $WARNING_COUNT warnings"
        else
            echo "OK|Normal" > "$status_file"
            log_info "$server - Check completed, status normal"
        fi
        # Clean remote script
        ssh -o StrictHostKeyChecking=no -i "$SSH_KEY" "$SSH_USER@$server" "rm -f /tmp/$CHECK_SCRIPT" >/dev/null 2>&1
        return 0
    else
        log_error "$server - Inspection execution failed"
        echo "FAILED|Execution failed" > "$status_file"
        return 1
    fi
}

# Concurrent batch check
batch_check() {
    if [ ! -f "$SERVER_LIST" ]; then
        log_error "Server list file not found: $SERVER_LIST"
        exit 1
    fi
    if [ ! -f "$CHECK_SCRIPT" ]; then
        log_error "Inspection script not found: $CHECK_SCRIPT"
        exit 1
    fi
    total=$(grep -v "^#" "$SERVER_LIST" | grep -v "^$" | wc -l)
    log_info "Starting batch inspection of $total servers with concurrency $CONCURRENT"
    while read server; do
        [[ -z "$server" || "$server" =~ ^# ]] && continue
        while [ $(jobs -r | wc -l) -ge $CONCURRENT ]; do sleep 1; done
        check_single_server "$server" &
    done < "$SERVER_LIST"
    wait
    log_info "All server inspections completed!"
}

# Generate HTML summary report
generate_html_report() {
    local summary_file="$RESULT_DIR/summary.html"
    log_info "Generating summary report..."
    cat > "$summary_file" <<'EOF'
<!DOCTYPE html>
<html>
<head>
    <meta charset="UTF-8">
    <title>Server Inspection Summary Report</title>
    <style>
        * { margin: 0; padding: 0; box-sizing: border-box; }
        body { font-family: 'Arial', 'Microsoft YaHei', sans-serif; background: #f5f5f5; padding: 20px; }
        .container { max-width: 1200px; margin: 0 auto; background: white; border-radius: 8px; box-shadow: 0 2px 10px rgba(0,0,0,0.1); }
        .header { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 30px; border-radius: 8px 8px 0 0; }
        .header h1 { font-size: 28px; margin-bottom: 10px; }
        .header p { opacity: 0.9; font-size: 14px; }
        .stats { display: flex; padding: 20px; background: #f8f9fa; border-bottom: 1px solid #e9ecef; }
        .stat-item { flex: 1; text-align: center; padding: 10px; }
        .stat-item .number { font-size: 32px; font-weight: bold; margin-bottom: 5px; }
        .stat-item .label { color: #6c757d; font-size: 14px; }
        .stat-ok .number { color: #28a745; }
        .stat-warning .number { color: #ffc107; }
        .stat-error .number { color: #dc3545; }
        .content { padding: 20px; }
        table { width: 100%; border-collapse: collapse; margin-top: 20px; }
        th, td { padding: 12px; text-align: left; border-bottom: 1px solid #e9ecef; }
        th { background: #f8f9fa; font-weight: 600; color: #495057; position: sticky; top: 0; }
        tr:hover { background: #f8f9fa; }
        .status-ok { background: #d4edda; color: #155724; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
        .status-warning { background: #fff3cd; color: #856404; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
        .status-error { background: #f8d7da; color: #721c24; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
        .status-failed { background: #6c757d; color: white; padding: 4px 8px; border-radius: 4px; display: inline-block; font-size: 12px; }
        a { color: #007bff; text-decoration: none; }
        a:hover { text-decoration: underline; }
        .footer { text-align: center; padding: 20px; color: #6c757d; font-size: 12px; border-top: 1px solid #e9ecef; }
    </style>
</head>
<body>
    <div class="container">
        <div class="header">
            <h1>🖥️ Server Inspection Summary Report</h1>
            <p>Inspection time: REPORT_TIME</p>
        </div>
        <div class="stats">
            <div class="stat-item stat-ok"><div class="number" id="ok-count">0</div><div class="label">Normal</div></div>
            <div class="stat-item stat-warning"><div class="number" id="warning-count">0</div><div class="label">Warnings</div></div>
            <div class="stat-item stat-error"><div class="number" id="error-count">0</div><div class="label">Errors</div></div>
            <div class="stat-item stat-error"><div class="number" id="failed-count">0</div><div class="label">Failed</div></div>
        </div>
        <div class="content">
            <table>
                <thead>
                    <tr><th>#</th><th>Server</th><th>Status</th><th>Details</th><th>Full report</th></tr>
                </thead>
                <tbody id="server-list">
                </tbody>
            </table>
        </div>
        <div class="footer">Generated by Batch Check Scheduler v2.0</div>
    </div>
    <script>
let okCount = 0, warningCount = 0, errorCount = 0, failedCount = 0;
const serverData = SERVER_DATA_PLACEHOLDER;
const tbody = document.getElementById('server-list');
serverData.forEach((item, index) => {
    const row = tbody.insertRow();
    row.innerHTML = `
        <td>${index + 1}</td>
        <td>${item.server}</td>
        <td><span class="status-${item.status.toLowerCase()}">${item.status}</span></td>
        <td>${item.detail}</td>
        <td><a href="${item.report}" target="_blank">View details</a></td>
    `;
    if (item.status === 'OK') okCount++;
    else if (item.status === 'WARNING') warningCount++;
    else if (item.status === 'ERROR') errorCount++;
    else failedCount++;
});
document.getElementById('ok-count').textContent = okCount;
document.getElementById('warning-count').textContent = warningCount;
document.getElementById('error-count').textContent = errorCount;
document.getElementById('failed-count').textContent = failedCount;
    </script>
</body>
</html>
EOF
    # Collect data from status files
    local server_data="["
    local first=true
    for status_file in "$RESULT_DIR"/*_status.txt; do
        [ -f "$status_file" ] || continue
        server=$(basename "$status_file" _status.txt)
        IFS='|' read status detail < "$status_file"
        if $first; then first=false; else server_data+=","; fi
        server_data+="{\"server\":\"$server\",\"status\":\"$status\",\"detail\":\"$detail\",\"report\":\"${server}_result.log\"}"
    done
    server_data+="]"
    sed -i "s/REPORT_TIME/$(date +"%Y-%m-%d %H:%M:%S")/g" "$summary_file"
    sed -i "s/SERVER_DATA_PLACEHOLDER/$server_data/g" "$summary_file"
    log_info "Summary report generated: $summary_file"
}

# Main entry point
main() {
    echo "=========================================="
    echo "    Batch server inspection scheduler v2.0"
    echo "=========================================="
    echo ""
    batch_check
    generate_html_report
    echo ""
    echo "=========================================="
    echo "    Inspection finished!"
    echo "    Result directory: $RESULT_DIR"
    echo "    Summary report: $RESULT_DIR/summary.html"
    echo "=========================================="
}

main "$@"

Conclusion

The seven scripts together form a comprehensive, low‑cost automation framework for proactive server health monitoring, security compliance, and operational stability. By integrating them into scheduled tasks and the batch scheduler, organizations can shift from reactive firefighting to proactive operations, reduce downtime, and improve overall system reliability.

Future enhancements may include JSON output for API integration, AI‑driven anomaly detection, container‑native adaptations, and tighter coupling with existing monitoring platforms such as Zabbix or Prometheus.

shell scriptingServer Automation
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.