8 Common Shell Script Mistakes Junior Ops Engineers Make (Are You Guilty?)
This article examines the eight most frequent errors junior and mid‑level Linux operations engineers make when writing Bash scripts—such as missing quotes, wrong comparison operators, incomplete file checks, ignoring return codes, mishandling spaces, concurrency issues, lack of error handling, and absent logging—and provides concrete examples, detailed analysis, and corrected code snippets to improve script reliability and maintainability.
Background and Problem
Shell scripts are a core tool for Linux operations engineers, used for everything from simple log cleanup to complex automated deployments. Many junior and mid‑level engineers write scripts that work in happy paths but fail under edge cases, often due to subtle mistakes.
Mistake 1: Missing Quotes Around Variable References
Incorrect Example
#!/bin/bash
# Variable without quotes – spaces break the command
filename="my document.txt"
ls -la $filename
rm $filename
cat $filename
today=$(date +%Y-%m-%d)
# Command substitution result without quotes
tar -czf backup-$today.tar.gz /data
# Test expression without quotes
name="John Smith"
if [ $name = "John" ]; then
echo "Found John"
fiAnalysis
When a variable contains spaces, Bash splits it into separate words, causing commands to receive wrong arguments. For example, ls -la $filename becomes ls -la my document.txt, which Bash interprets as two separate arguments ( my and document.txt).
Correct Approach
#!/bin/bash
# Always quote variable expansions
filename="my document.txt"
ls -la "$filename"
rm "$filename"
cat "$filename"
today=$(date +%Y-%m-%d)
# Quote the result in the command
tar -czf "backup-${today}.tar.gz" /data
# Quote variables inside test
name="John Smith"
if [ "$name" = "John" ]; then
echo "Found John"
fiSpecial Cases
# Double quotes allow variable and command substitution
name="world"
echo "Hello $name" # -> Hello world
echo 'Hello $name' # -> Hello $name (no substitution)Mistake 2: Using Wrong Operators for String Comparison
Incorrect Example
#!/bin/bash
# Using = for string comparison (POSIX‑non‑compliant)
name="abc"
if [ $name = "abc" ]; then echo "match"; fi
# Using == inside [] (non‑POSIX)
if [ $name == "abc" ]; then echo "match"; fi
# Using = for numeric comparison (wrong)
count=10
if [ $count = 10 ]; then echo "count is 10"; fi
# Using string operator inside [[ ]] for numbers (incorrect)
num=100
if [[ $num = 100 ]]; then echo "match"; fiAnalysis
Bash provides distinct operators: = and != for strings, -eq, -lt, -gt etc. for numbers. Inside [[ ]], == performs pattern matching, not strict equality.
Correct Approach
#!/bin/bash
# String comparison – use = or == inside [[ ]]
name="abc"
if [ "$name" = "abc" ]; then echo "string match"; fi
if [[ "$name" == "abc" ]]; then echo "match"; fi
# Numeric comparison – use -eq, -gt, etc.
count=10
if [ "$count" -eq 10 ]; then echo "count is 10"; fi
if [ "$count" -gt 5 ]; then echo "count > 5"; fi
# Pattern matching inside [[ ]]
filename="report.txt"
if [[ "$filename" == *.txt ]]; then echo "text file"; fiMistake 3: Incomplete File Existence Checks
Incorrect Example
#!/bin/bash
# Only checks existence, not type
if [ -e /path/to/file ]; then cat /path/to/file; fi
# Ignores symlink target
if [ -f /path/to/symlink ]; then cat /path/to/symlink; fi
# Logical errors when combining tests
if [ -f /path/a ] && [ -f /path/b ]; then diff /path/a /path/b; fi
# Variable used without quoting
filepath="/path/to file with spaces"
if [ -f $filepath ]; then echo "exists"; fiAnalysis
-eonly checks existence, -f checks for a regular file, and symlinks need -L or -h. Combining multiple tests requires proper short‑circuit handling.
Correct Approach
#!/bin/bash
filepath="/path/to/file"
# Proper quoting
if [ -f "$filepath" ]; then echo "普通文件存在"; fi
# Check symlink
if [ -L "$filepath" ]; then echo "是符号链接"; realpath=$(readlink -f "$filepath"); if [ -f "$realpath" ]; then echo "链接目标文件存在: $realpath"; else echo "警告:链接断开"; fi; fi
# Directory check
if [ -d "/path/to/dir" ]; then echo "目录存在"; fi
# Multiple files – all must exist
file1="/path/a"
file2="/path/b"
if [ -f "$file1" ] && [ -f "$file2" ]; then diff "$file1" "$file2"; fi
# Any one file exists
if [ -f "$file1" ] || [ -f "$file2" ]; then echo "至少一个文件存在"; fi
# Permission checks
if [ -r "$filepath" ]; then echo "文件可读"; fi
if [ -w "$filepath" ]; then echo "文件可写"; fi
if [ -x "$filepath" ]; then echo "文件可执行"; fi
# Safe file‑read function
safe_cat() {
local file="$1"
if [ -f "$file" ] && [ -r "$file" ]; then cat "$file"; else echo "Error: Cannot read file: $file" >&2; return 1; fi
}Mistake 4: Ignoring Command Return Values
Incorrect Example
#!/bin/bash
# No checks after cd
cd /some/directory
ls -la
# Pipeline only checks last command
grep "error" /var/log/app.log | cut -d ' ' -f1 | sort | uniq
# Test without checking
[ -f /path/to/file ]
echo "File check completed"
# Function without return status
function do_something() { # some ops }
# Variable assignment without checking command success
files=$(ls /nonexistent_directory)
echo "Files: $files"Analysis
Every command exits with a status (0 = success). Ignoring these statuses can hide failures and make debugging hard.
Correct Approach
#!/bin/bash
# Abort on any error
cd /some/directory || { echo "Failed to change directory" >&2; exit 1; }
set -e
# Use set -e, -u, -o pipefail for strict mode
set -euo pipefail
# Check grep result explicitly
if grep -q "error" /var/log/app.log; then echo "Found errors in log"; else echo "No errors found"; fi
# Proper function return handling
function do_something() {
# some operations
if [ $? -ne 0 ]; then return 1; fi
return 0
}
if ! do_something; then echo "Operation failed"; exit 1; fi
# Capture output while checking status
if ! output=$(ls /nonexistent 2>&1); then echo "Command failed: $output" >&2; fi
# Common patterns
command_that_might_fail || { echo "Failed"; exit 1; }
command_that_might_fail && echo "Success" || echo "Failed"Mistake 5: Not Handling Spaces and Special Characters
Incorrect Example
#!/bin/bash
# Loop over ls output – breaks on spaces
for file in $(ls *.txt); do cat "$file"; done
# Read while losing spaces
cat file.txt | while read line; do echo "$line"; done
# Array with spaces treated as separate words
names="John Jane Bob"
for name in $names; do echo "$name"; done
# Path with spaces
path="/path/with spaces/file"
ls -la $path
# User input with spaces
read -p "Enter name: " name
echo "Hello $name"Analysis
Bash performs word splitting based on IFS. Unquoted variables containing spaces are split, leading to wrong arguments or loop iterations.
Correct Approach
#!/bin/bash
# Use arrays or proper quoting
files=(*.txt)
for file in "${files[@]}"; do cat "$file"; done
# Find with -print0 and read -d ''
while IFS= read -r -d '' file; do echo "Processing: $file"; cat "$file"; done < <(find . -name "*.txt" -print0)
# Proper array handling
names=("John Smith" "Jane Doe" "Bob Anderson")
for name in "${names[@]}"; do echo "Name: $name"; done
# Path with spaces – always quote
path="/path/with spaces/file"
ls -la "$path"
# Safe user input – strip possible quotes
read -p "Enter name: " name
echo "Hello, ${name//\"/}"Mistake 6: Ignoring Concurrency
Incorrect Example
#!/bin/bash
# Multiple processes write to same file
for i in {1..10}; do echo "Process $i" >> /tmp/shared.txt; done
# Counter updated concurrently – lost updates
counter=0
for i in {1..100}; do counter=$((counter + 1)); done
echo $counter # may not be 100
# PID file not checked – duplicate starts
if [ -f /var/run/myapp.pid ]; then echo "Already running"; exit 1; fi
echo $$ > /var/run/myapp.pid
# Temporary file conflict
for i in {1..10}; do sort data.txt > /tmp/sorted.txt; doneAnalysis
Concurrent writes cause race conditions, data loss, and duplicate executions. Proper locking, PID checks, and isolated temporary files are required.
Correct Approach
#!/bin/bash
# Lock file method
LOCKFILE="/var/run/myapp.lock"
if ! mkdir "$LOCKFILE" 2>/dev/null; then echo "Another instance is running"; exit 1; fi
trap 'rmdir "$LOCKFILE"' EXIT
# flock (recommended)
exec 200>"/var/run/myapp.lock"
if ! flock -n 200; then echo "Another instance is running"; exit 1; fi
# PID file with real process check
PIDFILE="/var/run/myapp.pid"
if [ -f "$PIDFILE" ]; then
oldpid=$(cat "$PIDFILE")
if kill -0 "$oldpid" 2>/dev/null; then echo "Already running with PID $oldpid"; exit 1; fi
rm -f "$PIDFILE"
fi
echo $$ > "$PIDFILE"
trap 'rm -f "$PIDFILE"' EXIT
# Parallel processing with separate temp files
tmpdir=$(mktemp -d)
for i in {1..10}; do
(sort "data_${i}.txt" > "$tmpdir/sorted_${i}.txt") &
done
wait
cat "$tmpdir"/sorted_*.txt | sort -m > /tmp/final.txt
rm -rf "$tmpdir"
# Semaphore example – limit to 5 concurrent jobs
semaphore() {
local max_jobs=$1; shift
local jobs=()
for cmd in "$@"; do
while [ "${#jobs[@]}" -ge $max_jobs ]; do
wait -n
jobs=($(jobs -p))
done
eval "$cmd" &
jobs+=($!)
done
wait
}
semaphore 5 "task1.sh &" "task2.sh &" "task3.sh &" "task4.sh &" "task5.sh &"Mistake 7: No Error Handling
Incorrect Example
#!/bin/bash
# Continue after failures
cd /nonexistent
echo "Current directory: $(pwd)"
# No verification of critical resources
cp /data/important.db /backup/
rm -rf /data/
# No network check
curl -s http://api.example.com/data > /tmp/data.json
# Blind delete
rm -rf /tmp/build/*
# Function without error handling
function do_critical_work() { some_command }Analysis
Without proper error handling, any unexpected failure can corrupt data or leave the system in an inconsistent state.
Correct Approach
#!/bin/bash
set -euo pipefail
# Central error handler
error_exit() { echo "ERROR: $1" >&2; exit ${2:-1}; }
# Verify resources before proceeding
verify_resources() {
[ -d "/data" ] || error_exit "Directory /data does not exist"
[ -d "/backup" ] || error_exit "Directory /backup does not exist"
available=$(df -k /backup | awk 'NR==2 {print $4}')
[ "$available" -gt 1000000 ] || error_exit "Insufficient disk space"
curl -s --max-time 5 http://api.example.com/health > /dev/null || error_exit "Cannot connect to API"
}
# Safe backup with verification
safe_backup() {
local src="/data/important.db" dst="/backup/important.db"
[ -f "$src" ] || error_exit "Source file not found: $src"
local tmp_dst="${dst}.tmp.$$"
cp "$src" "$tmp_dst" || error_exit "Backup copy failed"
cmp -s "$src" "$tmp_dst" || { rm -f "$tmp_dst"; error_exit "Backup verification failed"; }
mv "$tmp_dst" "$dst" || { rm -f "$tmp_dst"; error_exit "Backup move failed"; }
echo "Backup successful: $dst"
}
# Signal handling and cleanup
cleanup() { echo "Cleaning up..."; rm -rf /tmp/myapp_*; [ -f /var/run/myapp.pid ] && kill $(cat /var/run/myapp.pid) 2>/dev/null || true; }
trap cleanup EXIT INT TERM
# Main flow
verify_resources
safe_backupMistake 8: No Logging
Incorrect Example
#!/bin/bash
# All output goes to stdout
echo "Starting process"
some_command
echo "Process completed"
# No error logging
error_command
# No timestamps
long_running_task
echo "Done"
# Scattered echo redirections
echo "Starting" > /tmp/process.log
echo "Continuing" >> /tmp/process.log
echo "Finished" >> /tmp/process.log
# No log rotation – file grows indefinitely
while true; do echo "$(date): heartbeat" >> /var/log/myapp.log; sleep 60; doneAnalysis
Without structured logging, troubleshooting becomes difficult, and logs can become unwieldy.
Correct Approach
#!/bin/bash
# Logging configuration
LOGFILE="/var/log/myapp/app.log"
LOGLEVEL="INFO" # DEBUG, INFO, WARN, ERROR
declare -A LOG_LEVELS=([DEBUG]=0 [INFO]=1 [WARN]=2 [ERROR]=3)
log() {
local level="${1:-INFO}"; shift
local message="$*"
local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
local current_level=${LOG_LEVELS[$LOGLEVEL]}
local message_level=${LOG_LEVELS[$level]}
[ "$message_level" -lt "$current_level" ] && return
echo "[$timestamp] [$level] $message" >> "$LOGFILE"
[ "$message_level" -ge ${LOG_LEVELS[INFO]} ] && echo "[$level] $message"
}
log_debug() { log DEBUG "$@"; }
log_info() { log INFO "$@"; }
log_warn() { log WARN "$@"; }
log_error() { log ERROR "$@"; }
# Example usage
log_info "Script started"
log_debug "Configuration: debug=true"
if ! command_that_might_fail; then log_error "Task failed"; exit 1; fi
log_info "Task completed successfully"
# Log rotation (10 MB limit, keep 7 files)
rotate_log() {
local logfile="$1"
local max_size=$((10*1024*1024))
if [ -f "$logfile" ] && [ $(stat -c%s "$logfile") -gt $max_size ]; then
mv "$logfile" "${logfile}.$(date +%Y%m%d-%H%M%S)" && gzip "${logfile}.$(date +%Y%m%d-%H%M%S)" &
ls -1t "${logfile}".*.gz 2>/dev/null | tail -n +8 | xargs rm -f 2>/dev/null || true
fi
}
init_log() { mkdir -p "$(dirname "$LOGFILE")"; touch "$LOGFILE"; rotate_log "$LOGFILE"; }
init_logBest‑Practice Checklist
Quote all variable expansions.
Use the correct operators for string ( =, !=) and numeric ( -eq, -lt, …) comparisons.
Perform complete file checks: existence, type, permissions, and symlink resolution.
Check return values of every critical command.
Handle spaces and special characters by quoting and using arrays or find -print0.
Guard against concurrent execution with lock files, flock, or PID checks.
Implement robust error handling, custom exit functions, and cleanup traps.
Use a structured logging system with levels, timestamps, and rotation.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
