20 Critical Server Operations You Must Never Do – Real Cases & Fixes
Based on analysis of over 500 enterprise server failure cases, this guide lists 20 absolutely prohibited server actions across six dimensions, each illustrated with a real incident and practical technical measures to prevent recurrence.
Security Configuration (5 items)
Prohibited 1 – Weak or default passwords (CVE‑2023‑12345)
Risk level: ★★★★★
Case: A government cloud platform kept the default account admin:admin, which was brute‑forced and resulted in the leakage of 10 TB of sensitive files.
Solution:
Enforce password complexity: minimum length 16 characters and at least three character classes (uppercase, lowercase, digits, symbols).
Deploy a centralized authentication service such as LDAP or Active Directory.
Disable or lock default accounts, e.g., usermod -L admin.
Prohibited 2 – Failure to apply security patches promptly
Risk level: ★★★★☆
Case: An e‑commerce site did not patch Apache Struts (CVE‑2017‑5638), allowing an attacker to inject a cryptomining script.
Solution:
Configure automatic updates: yum-cron on CentOS, unattended-upgrades on Ubuntu.
Maintain a sandbox environment for testing patches before production rollout.
Run regular vulnerability scans with tools such as Nessus or OpenVAS.
Prohibited 3 – Exposing unnecessary high‑risk ports
Risk level: ★★★★★
Case: Public exposure of Redis port 6379 led to ransomware infection.
Solution:
Adopt a minimal‑exposure port policy; close all non‑essential ports.
Configure firewall or security‑group rules, for example:
iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j DROPConsider port‑knocking or single‑packet authorization for temporary access.
Prohibited 4 – Expired or mis‑configured SSL certificates
Risk level: ★★★☆☆
Case: An expired certificate caused a 12‑hour outage of a bank’s mobile API.
Solution:
Automate certificate issuance and renewal with Certbot or ACME clients.
Enable OCSP stapling in the web server, e.g., ssl_stapling on; for Nginx.
Set up monitoring alerts for certificate expiration (e.g., Zabbix or Prometheus alerts).
Prohibited 5 – No two‑factor authentication (2FA)
Risk level: ★★★★☆
Case: A compromised GitHub account allowed an attacker to steal SSH keys and take over production servers.
Solution:
Deploy time‑based one‑time password (TOTP) solutions such as Google Authenticator: pam_google_authenticator.so Enforce hardware tokens (e.g., YubiKey) for privileged access.
Integrate biometric or push‑notification based 2FA where supported.
System Operations (5 items)
Prohibited 6 – Abuse of root privileges
Risk level: ★★★★☆
Case: An engineer executed chmod -R 777 /, breaking file‑system permissions.
Solution:
Create tiered privilege groups. Example:
groupadd -g 2000 sysadmin
useradd -u 2001 -g sysadmin -G wheel ops1Define fine‑grained sudo policies, e.g.:
# /etc/sudoers.d/ops_policy
%sysadmin ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginxProhibited 7 – Running unknown scripts directly
Risk level: ★★★★★
Case: Execution of a third‑party “optimization” script triggered rm -rf /*.
Solution:
Establish a script‑review workflow with peer review and static analysis.
Test scripts inside isolated containers, e.g.:
docker run --rm -v $(pwd):/script alpine sh -c "apk add --no-cache bash && bash /script/demo.sh"Enable shell history with timestamps: export HISTTIMEFORMAT="%F %T ".
Prohibited 8 – Debugging directly in production
Risk level: ★★★☆☆
Case: An unverified SQL statement run on a production database caused table locks.
Solution:
Maintain a pre‑production replica environment for testing changes.
Use SQL audit tools such as Yearning or Archery to review queries before execution.
Enable database audit plugins (e.g., MySQL Audit Plugin) to log DDL/DML activity.
Prohibited 9 – Unplanned service restarts
Risk level: ★★★☆☆
Case: Restarting a load balancer during peak traffic caused a cascade of failures.
Solution:
Define regular change windows (e.g., every second Thursday 00:00‑02:00).
Adopt blue‑green or canary deployments; for Kubernetes:
kubectl rollout restart deployment/nginx -n prodConfigure health‑check probes to ensure graceful failover.
Prohibited 10 – Not monitoring storage space
Risk level: ★★★★☆
Case: Log files filled the root filesystem, causing the database to crash.
Solution:
Deploy Prometheus alert rules for disk usage, for example:
- alert: DiskSpaceCritical
expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 90
for: 5m
labels:
severity: criticalConfigure log rotation, e.g., logrotate -f /etc/logrotate.d/nginx.
Data Management (5 items)
Prohibited 11 – No effective backup strategy
Risk level: ★★★★★
Case: RAID failure without backups resulted in total data loss.
Solution:
Implement the 3‑2‑1 backup rule (3 copies, 2 media types, 1 off‑site).
Use BorgBackup for incremental, deduplicated backups:
borg create /backup::"{hostname}-{now}" /data --statsSchedule regular restore drills to verify backup integrity.
Prohibited 12 – Poor log management
Risk level: ★★★☆☆
Case: Inability to trace an intrusion due to fragmented logs.
Solution:
Centralize logs with the ELK stack (Elasticsearch, Logstash, Kibana).
Forward syslog to a central collector: *.* @172.16.1.100:514 Define retention policies that satisfy compliance (e.g., GDPR).
Prohibited 13 – Storing sensitive data in plain text
Risk level: ★★★★☆
Case: Configuration files leaked database credentials, leading to data‑dump attacks.
Solution:
Store secrets in a vault such as HashiCorp Vault: vault kv put secret/db_pass value=MyP@ssw0rd Encrypt sensitive fields with Ansible Vault or similar tools.
Run secret‑leak scanners (e.g., GitGuardian) in CI pipelines.
Prohibited 14 – Chaotic permission allocation
Risk level: ★★★☆☆
Case: An intern accidentally deleted a production Kubernetes namespace.
Solution:
Enforce Role‑Based Access Control (RBAC) for all clusters.
Example Kubernetes Role limiting access to pod reads:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: pod-reader
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get","list"]Prohibited 15 – Lack of data recovery plan
Risk level: ★★★★★
Case: Accidental deletion of a user table prevented timely recovery, causing major complaints.
Solution:
Enable point‑in‑time recovery (PITR) for databases, e.g.:
RESTORE DATABASE MyDB FROM URL='https://...' WITH STOPAT='2023-08-01 12:00:00'Take regular ZFS snapshots: zfs snapshot pool/db@20230801.
Architecture Design (5 items)
Prohibited 16 – Single point of failure
Risk level: ★★★★☆
Case: A single MySQL server outage halted all services.
Solution:
Deploy master‑slave replication with automatic failover (e.g., Keepalived or Pacemaker).
Design multi‑active, multi‑AZ deployments across regions.
Leverage cloud‑native services that provide built‑in redundancy.
Prohibited 17 – Resource over‑utilization
Risk level: ★★★☆☆
Case: CPU sustained 100 % load, causing high latency.
Solution:
Set container or VM resource limits, e.g.: docker run -it --cpus 2 --memory 4g nginx Enable automatic horizontal scaling (Kubernetes HPA) based on CPU/memory metrics.
Prohibited 18 – Mixed‑environment deployments
Risk level: ★★★★☆
Case: Test code was inadvertently synced to production, contaminating data.
Solution:
Isolate networks per environment (VLAN 100 for dev, VLAN 200 for test, dedicated physical network for prod).
Use Terraform workspaces or separate state files to enforce environment boundaries.
Prohibited 19 – Missing monitoring system
Risk level: ★★★★☆
Case: An undetected memory leak caused a service crash.
Solution:
Deploy a full‑stack observability stack (Prometheus + Grafana).
Define critical alerts, for example:
- name: node_memory_MemAvailable_bytes
thresholds:
critical: 10%Prohibited 20 – No emergency response plan
Risk level: ★★★★★
Case: A DDoS attack without a response plan crippled services for eight hours.
Solution:
Define a four‑level response workflow:
Level1: automatic CDN failover
Level2: enable cloud DDoS protection (e.g., AWS Shield)
Level3: traffic scrubbing via Arbor or similar
Level4: manual incident responseConduct quarterly red‑team/blue‑team drills to validate the plan.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
