Operations 42 min read

10 Essential Shell Scripts to Halve Your Ops Workload

These ten practical Bash scripts automate common sysadmin tasks—disk space checks, log rotation, resource monitoring, backup validation, process guarding, port probing, and more—providing reusable, idempotent solutions with logging, alerting, dry‑run support, and cron integration to streamline operations.

Ops Community
Ops Community
Ops Community
10 Essential Shell Scripts to Halve Your Ops Workload

Problem Background

System administration includes many repetitive tasks such as disk inspection, log cleanup, process guarding, backup verification, and batch operations. Performing these tasks manually is slow and error‑prone. Using Bash scripts to automate them provides consistent, repeatable execution.

Applicable Scenarios

Daily health inspection

Cron‑driven scheduled jobs

Automatic alert notifications

Batch operations across many hosts

Simple fault‑tolerance scenarios

Backup verification automation

Core Guidelines

Enable strict mode: set -euo pipefail Parameterize scripts via command‑line options or environment variables.

Write log entries with timestamps (e.g., $(date '+%Y-%m-%d %H:%M:%S')).

Provide a --dry-run mode that only prints actions without side effects.

Design scripts to be idempotent so repeated runs do not cause data loss.

Overall Implementation Approach

Feature description (what the script does).

Usage syntax (command‑line options and defaults).

Full script source (self‑contained Bash script).

Key logic explanation (important functions, thresholds, alert handling).

Cron configuration example.

Risk reminders (permissions, directory existence, timeout handling, etc.).

Script 1 – Disk Space Inspection & Alert

Function : Checks usage of all mounted partitions; triggers an alert when usage exceeds a threshold.

Usage :

# Default threshold 80%
bash disk_check.sh
# Custom threshold
bash disk_check.sh --threshold 90
# Dry‑run (only output, no alert)
bash disk_check.sh --dry-run

Key Logic :

Uses

df -h --output=source,size,used,avail,pcent,target -x tmpfs -x devtmpfs -x squashfs

to list real filesystems.

Parses the percentage column; if it meets or exceeds THRESHOLD, builds an alert message and sends it via a WeChat webhook (environment variable WECHAT_WEBHOOK_URL).

Logs each step with log_info and log_warn functions.

Cron configuration (run every 10 minutes):

*/10 * * * * /opt/scripts/disk_check.sh >> /var/log/disk_check_cron.log 2>&1

Risk reminders :

Ensure the directory for LOG_FILE exists and is writable.

Pass WECHAT_WEBHOOK_URL via environment variable; do not hard‑code it.

Wrap the df call with timeout 10 df -h to avoid hangs on NFS mounts.

Script 2 – Memory / CPU / Load Inspection

Function : Collects memory usage, CPU usage, and system load; sends alerts when thresholds are exceeded.

Usage :

# Default thresholds (memory 85%, CPU 90%, load factor 2)
bash sys_resource_check.sh
# Custom thresholds
bash sys_resource_check.sh --mem-threshold 80 --cpu-threshold 85 --load-factor 1.5
# Dry‑run
bash sys_resource_check.sh --dry-run

Key Logic :

Memory: parses free output, calculates usage percentage, compares to MEM_THRESHOLD.

CPU: extracts idle percentage from top (fallback to mpstat) and computes usage.

Load: compares 1‑minute load average to CPU_COUNT * LOAD_THRESHOLD_FACTOR.

All checks log their results and invoke send_alert when a threshold is crossed.

Cron configuration (run every 5 minutes):

*/5 * * * * /opt/scripts/sys_resource_check.sh >> /var/log/sys_resource_cron.log 2>&1

Risk reminders :

Set thresholds appropriate for the target host.

Dry‑run mode prevents accidental alerts during testing.

Script 3 – Large File Search & Cleanup

Function : Finds files larger than a specified size, supports dry‑run and interactive deletion.

Usage :

# Default dry‑run, search /var/log for files >500M
bash bigfile_cleanup.sh
# Delete files >100M with confirmation
bash bigfile_cleanup.sh /var/log 100M false

Key Logic :

Accepts SEARCH_DIR, SIZE_THRESHOLD, and DRY_RUN parameters.

Uses find … -size +${SIZE_THRESHOLD} -print0 and reads entries safely with while IFS= read -r -d ''.

For each file, logs path, size, owner, and modification time.

If DRY_RUN=true, only reports; otherwise prompts the user ( read -rp) before rm -f.

Cron configuration (run daily at 2 am):

0 2 * * * /opt/scripts/bigfile_cleanup.sh /var/log 500M true

Risk reminders :

Default to dry‑run; no files are removed unless DRY_RUN=false is explicitly set.

Avoid running on system directories such as / or /usr.

Use lsof "$file_path" to verify that a log file is not currently in use before deletion.

Script 4 – Log Rotation & Archiving

Function : Rotates log files by date, compresses them, and removes archives older than a retention period.

Usage :

# Daily rotation at 2 am, keep 30 days, no dry‑run
0 2 * * * /opt/scripts/log_rotate.sh /var/log/app /var/log/app/archive 30 false

Key Logic :

Parameters: LOG_DIR, ARCHIVE_DIR, RETAIN_DAYS, DRY_RUN.

Iterates over ${LOG_DIR}/*.log; skips empty files.

If an archive for today already exists, logs and skips.

Copy‑truncate method: copies the current log to ${ARCHIVE_PATH}, truncates the original file, then compresses with gzip.

Cleanup: find ${ARCHIVE_DIR} -name "*.gz" -mtime +${RETAIN_DAYS} deletes old archives (dry‑run mode logs actions only).

Cron configuration (run daily at 2 am):

0 2 * * * /opt/scripts/log_rotate.sh /var/log/app /var/log/app/archive 30 false

Risk reminders :

The : > "$log_file" operation empties the file; use with caution.

Copy‑truncate may lose lines written during the copy; for critical logs consider logrotate with create and appropriate signal handling.

Deletion uses find -mtime, which is based on file modification time and avoids accidental removal of unrelated files.

Script 5 – Process Guard & Auto‑Restart

Function : Checks whether a given process is running; if not, attempts to start it, limits restarts per day, and sends alerts.

Usage :

# Guard nginx, restart with systemctl
bash process_guard.sh nginx 'systemctl start nginx'
# Guard a Java service
bash process_guard.sh myapp '/opt/app/start.sh'

Key Logic :

Accepts PROCESS_NAME and START_CMD as arguments.

Uses pgrep -x (or pgrep -f) to detect the process.

Restart count is stored per day in

/var/run/process_guard/${PROCESS_NAME}.$(date '+%Y%m%d').count

. When the count reaches MAX_RESTART, the script stops attempting restarts and sends a critical alert.

After a successful restart, the count is incremented and an informational alert is sent.

All actions are logged via log_info, log_warn, and log_error.

Cron configuration (run every minute):

* * * * * /opt/scripts/process_guard.sh nginx 'systemctl start nginx'
* * * * * /opt/scripts/process_guard.sh redis-server 'systemctl start redis'

Risk reminders :

Set MAX_RESTART to avoid endless restart loops.

Java services may need a larger RESTART_INTERVAL (30‑60 seconds) because startup is slower.

Prefer native

systemd
Restart=on-failure

when possible; this script is for environments where systemd cannot be used.

Script 6 – Service Port Availability Check

Function : Checks a list of IP:PORT:NAME entries; alerts when a port is unreachable.

Usage :

# Use default config file
bash port_check.sh /opt/scripts/port_check.conf

Key Logic :

Reads the configuration file, skips empty lines and comments.

For each entry, attempts a TCP connection with timeout ${TIMEOUT} bash -c "echo > /dev/tcp/${host}/${port}".

Successful connections are logged as OK; failures trigger log_warn and send_alert.

Exits with status 1 if any check fails, allowing cron to detect failures.

Cron configuration (run every 2 minutes):

*/2 * * * * /opt/scripts/port_check.sh /opt/scripts/port_check.conf

Risk reminders :

Ensure the config file exists and follows the IP:PORT:NAME format.

The script itself is read‑only; no files are modified.

Script 7 – Backup & Verification

Function : Creates a tar.gz backup of a source directory, generates a SHA‑256 checksum, optionally syncs to a remote host, and removes backups older than a retention period.

Usage :

# Simple backup of /etc (default 7‑day retention)
bash backup.sh /etc
# MySQL data backup, keep 14 days
bash backup.sh /var/lib/mysql /data/backup 14
# Backup with remote sync
bash backup.sh /opt/app /data/backup 7 backup@backup-server:/backup/

Key Logic :

Parameters: SOURCE_DIR, BACKUP_DIR (default /data/backup), RETAIN_DAYS (default 7), REMOTE_HOST (optional user@host:/path).

Creates a timestamped archive name ${SOURCE_DIR##*/}_YYYYMMDD_HHMMSS.tar.gz.

Runs tar -czf and logs the archive size.

Generates sha256sum file and verifies the checksum; aborts on verification failure.

If REMOTE_HOST is set, syncs the archive and checksum with rsync -avz.

Deletes archives older than RETAIN_DAYS using find -mtime +${RETAIN_DAYS}.

Cron configuration (daily at 3 am for /etc, 4 am for /opt/app):

0 3 * * * /opt/scripts/backup.sh /etc /data/backup 30
0 4 * * * /opt/scripts/backup.sh /opt/app /data/backup 7 [email protected]:/remote_backup/

Risk reminders :

If the source directory is being written to, the tar archive may be inconsistent; use dedicated dump tools for databases.

Ensure sufficient space in BACKUP_DIR.

Remote sync relies on SSH key authentication; configure keys beforehand.

Periodically test restore by extracting a backup.

Script 8 – Batch Host Command Execution

Function : Executes a command on multiple hosts listed in a file, with configurable concurrency and result aggregation.

Usage :

# Run uptime on all hosts
bash batch_exec.sh hosts.txt 'uptime'
# Run disk usage with 10 parallel jobs
bash batch_exec.sh hosts.txt 'df -h' 10

Key Logic :

Reads host list, ignoring empty lines and comments.

Exports helper function exec_on_host and uses xargs -P ${CONCURRENCY} to run SSH commands in parallel.

Each SSH invocation uses StrictHostKeyChecking=no, a connection timeout, and BatchMode=yes to avoid password prompts.

Results are written to ${RESULT_DIR}/${host}.out and summarized (success/failed counts).

Cron configuration (run every minute):

* * * * * /opt/scripts/batch_exec.sh hosts.txt 'uptime'

Risk reminders :

Validate the command on a single host before batch execution.

Avoid destructive commands such as rm or reboot without additional safeguards.

Ensure SSH keys are set up to prevent interactive password prompts.

Set CONCURRENCY appropriately; too high may exceed SSH connection limits.

Script 9 – Log Keyword Alert

Function : Monitors a log file for specified keywords (e.g., ERROR, OOM) and sends an alert; includes a cooldown period to avoid alert storms.

Usage :

# Default keywords
bash log_alert.sh /var/log/app/error.log
# Custom keyword pattern
bash log_alert.sh /var/log/nginx/error.log 'error|crit|alert|emerg'

Key Logic :

Tracks the last read offset in /var/run/log_alert/${file_hash}.offset. On first run, starts from the end of the file to avoid historic alerts.

If the file is rotated (size smaller than previous offset), resets the offset to zero.

Reads new content with tail -c +$((last_offset+1)), filters with grep -E, and limits the alert sample to the first three matches.

Cooldown file ${file_hash}.cooldown stores the timestamp of the last alert; if the current time minus the stored time is less than COOLDOWN (default 300 seconds), the alert is suppressed.

When an alert is sent, updates the cooldown file and the offset file.

Cron configuration (run every minute):

* * * * * /opt/scripts/log_alert.sh /var/log/app/error.log
* * * * * /opt/scripts/log_alert.sh /var/log/nginx/error.log 'error|crit|alert|emerg'

Risk reminders :

Only monitors new log entries; historic lines are ignored.

Do not use this script for destructive actions (e.g., rm).

Ensure the log file exists and is readable by the script user.

Script 10 – Database Inspection (MySQL + Redis)

Function : Performs basic health checks on MySQL and Redis instances, reports connection status, resource usage, replication health, and persistence status.

Usage :

# Provide passwords via environment variables
MYSQL_PASSWORD='xxx' REDIS_PASSWORD='yyy' bash db_inspect.sh

Key Logic :

MySQL checks (executed only if MYSQL_PASSWORD is set):

Connection test ( SELECT 1).

Current vs. max connections and usage percentage; warns if usage ≥ 80 %.

InnoDB buffer‑pool hit rate calculated from Innodb_buffer_pool_reads and Innodb_buffer_pool_read_requests.

Slow query count ( Slow_queries).

Uptime in days.

If the server is a replica, reports Slave_IO_Running, Slave_SQL_Running, and Seconds_Behind_Master.

Redis checks (executed only if REDIS_PASSWORD is set):

Ping test.

Memory usage ( used_memory_human) vs. maxmemory_human.

Connected clients.

Key‑space hit rate ( keyspace_hits / (keyspace_hits+keyspace_misses)).

Number of evicted keys; warns if > 0.

Slowlog length.

Persistence status (RDB and AOF last‑save results).

Role (master/slave) and, for slaves, replication link status.

All results are written to /var/log/db_inspect.log and a human‑readable report file /tmp/db_inspect_report_YYYYMMDD.txt.

Cron configuration (run daily at 8 am):

0 8 * * * MYSQL_PASSWORD='xxx' REDIS_PASSWORD='yyy' /opt/scripts/db_inspect.sh

Risk reminders :

Passwords are passed via environment variables; never hard‑code them.

The script performs read‑only queries only; it does not modify data.

If Redis has no password, omit REDIS_PASSWORD.

Common Risk Mitigations

Never hard‑code credentials; use environment variables or encrypted files with permissions 600.

All scripts write to dedicated log files under /var/log/; rotate those logs with logrotate to prevent log growth.

Use file locks (e.g., exec 200>/var/run/script.lock; flock -n 200 || exit 0) for scripts that may be triggered concurrently.

Wrap network‑dependent commands with timeout to avoid hanging cron jobs.

Run scripts with the least privileged user required; only use root when absolutely necessary.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MonitoringAutomationshellsysadminbackupcronbash
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.