Information Security 14 min read

20 Critical Server Operations You Must Never Do – Real Cases & Fixes

Based on analysis of over 500 enterprise server failure cases, this guide lists 20 absolutely prohibited server actions across six dimensions, each illustrated with a real incident and practical technical measures to prevent recurrence.

ITPUB

May 3, 2025

20 Critical Server Operations You Must Never Do – Real Cases & Fixes

Security Configuration (5 items)

Prohibited 1 – Weak or default passwords (CVE‑2023‑12345)

Risk level: ★★★★★

Case: A government cloud platform kept the default account admin:admin, which was brute‑forced and resulted in the leakage of 10 TB of sensitive files.

Solution:

Enforce password complexity: minimum length 16 characters and at least three character classes (uppercase, lowercase, digits, symbols).

Deploy a centralized authentication service such as LDAP or Active Directory.

Disable or lock default accounts, e.g., usermod -L admin.

Prohibited 2 – Failure to apply security patches promptly

Risk level: ★★★★☆

Case: An e‑commerce site did not patch Apache Struts (CVE‑2017‑5638), allowing an attacker to inject a cryptomining script.

Solution:

Configure automatic updates: yum-cron on CentOS, unattended-upgrades on Ubuntu.

Maintain a sandbox environment for testing patches before production rollout.

Run regular vulnerability scans with tools such as Nessus or OpenVAS.

Prohibited 3 – Exposing unnecessary high‑risk ports

Risk level: ★★★★★

Case: Public exposure of Redis port 6379 led to ransomware infection.

Solution:

Adopt a minimal‑exposure port policy; close all non‑essential ports.

Configure firewall or security‑group rules, for example:

iptables -A INPUT -p tcp --dport 22 -s 192.168.1.0/24 -j ACCEPT
iptables -A INPUT -p tcp --dport 443 -j DROP

Consider port‑knocking or single‑packet authorization for temporary access.

Prohibited 4 – Expired or mis‑configured SSL certificates

Risk level: ★★★☆☆

Case: An expired certificate caused a 12‑hour outage of a bank’s mobile API.

Solution:

Automate certificate issuance and renewal with Certbot or ACME clients.

Enable OCSP stapling in the web server, e.g., ssl_stapling on; for Nginx.

Set up monitoring alerts for certificate expiration (e.g., Zabbix or Prometheus alerts).

Prohibited 5 – No two‑factor authentication (2FA)

Risk level: ★★★★☆

Case: A compromised GitHub account allowed an attacker to steal SSH keys and take over production servers.

Solution:

Deploy time‑based one‑time password (TOTP) solutions such as Google Authenticator: pam_google_authenticator.so Enforce hardware tokens (e.g., YubiKey) for privileged access.

Integrate biometric or push‑notification based 2FA where supported.

System Operations (5 items)

Prohibited 6 – Abuse of root privileges

Risk level: ★★★★☆

Case: An engineer executed chmod -R 777 /, breaking file‑system permissions.

Solution:

Create tiered privilege groups. Example:

groupadd -g 2000 sysadmin
useradd -u 2001 -g sysadmin -G wheel ops1

Define fine‑grained sudo policies, e.g.:

# /etc/sudoers.d/ops_policy
%sysadmin ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart nginx

Prohibited 7 – Running unknown scripts directly

Risk level: ★★★★★

Case: Execution of a third‑party “optimization” script triggered rm -rf /*.

Solution:

Establish a script‑review workflow with peer review and static analysis.

Test scripts inside isolated containers, e.g.:

docker run --rm -v $(pwd):/script alpine sh -c "apk add --no-cache bash && bash /script/demo.sh"

Enable shell history with timestamps: export HISTTIMEFORMAT="%F %T ".

Prohibited 8 – Debugging directly in production

Risk level: ★★★☆☆

Case: An unverified SQL statement run on a production database caused table locks.

Solution:

Maintain a pre‑production replica environment for testing changes.

Use SQL audit tools such as Yearning or Archery to review queries before execution.

Enable database audit plugins (e.g., MySQL Audit Plugin) to log DDL/DML activity.

Prohibited 9 – Unplanned service restarts

Risk level: ★★★☆☆

Case: Restarting a load balancer during peak traffic caused a cascade of failures.

Solution:

Define regular change windows (e.g., every second Thursday 00:00‑02:00).

Adopt blue‑green or canary deployments; for Kubernetes:

kubectl rollout restart deployment/nginx -n prod

Configure health‑check probes to ensure graceful failover.

Prohibited 10 – Not monitoring storage space

Risk level: ★★★★☆

Case: Log files filled the root filesystem, causing the database to crash.

Solution:

Deploy Prometheus alert rules for disk usage, for example:

- alert: DiskSpaceCritical
  expr: 100 - (node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 90
  for: 5m
  labels:
    severity: critical

Configure log rotation, e.g., logrotate -f /etc/logrotate.d/nginx.

Data Management (5 items)

Prohibited 11 – No effective backup strategy

Risk level: ★★★★★

Case: RAID failure without backups resulted in total data loss.

Solution:

Implement the 3‑2‑1 backup rule (3 copies, 2 media types, 1 off‑site).

Use BorgBackup for incremental, deduplicated backups:

borg create /backup::"{hostname}-{now}" /data --stats

Schedule regular restore drills to verify backup integrity.

Prohibited 12 – Poor log management

Risk level: ★★★☆☆

Case: Inability to trace an intrusion due to fragmented logs.

Solution:

Centralize logs with the ELK stack (Elasticsearch, Logstash, Kibana).

Forward syslog to a central collector: *.* @172.16.1.100:514 Define retention policies that satisfy compliance (e.g., GDPR).

Prohibited 13 – Storing sensitive data in plain text

Risk level: ★★★★☆

Case: Configuration files leaked database credentials, leading to data‑dump attacks.

Solution:

Store secrets in a vault such as HashiCorp Vault: vault kv put secret/db_pass value=MyP@ssw0rd Encrypt sensitive fields with Ansible Vault or similar tools.

Run secret‑leak scanners (e.g., GitGuardian) in CI pipelines.

Prohibited 14 – Chaotic permission allocation

Risk level: ★★★☆☆

Case: An intern accidentally deleted a production Kubernetes namespace.

Solution:

Enforce Role‑Based Access Control (RBAC) for all clusters.

Example Kubernetes Role limiting access to pod reads:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: pod-reader
rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get","list"]

Prohibited 15 – Lack of data recovery plan

Risk level: ★★★★★

Case: Accidental deletion of a user table prevented timely recovery, causing major complaints.

Solution:

Enable point‑in‑time recovery (PITR) for databases, e.g.:

RESTORE DATABASE MyDB FROM URL='https://...' WITH STOPAT='2023-08-01 12:00:00'

Take regular ZFS snapshots: zfs snapshot pool/db@20230801.

Architecture Design (5 items)

Prohibited 16 – Single point of failure

Risk level: ★★★★☆

Case: A single MySQL server outage halted all services.

Solution:

Deploy master‑slave replication with automatic failover (e.g., Keepalived or Pacemaker).

Design multi‑active, multi‑AZ deployments across regions.

Leverage cloud‑native services that provide built‑in redundancy.

Prohibited 17 – Resource over‑utilization

Risk level: ★★★☆☆

Case: CPU sustained 100 % load, causing high latency.

Solution:

Set container or VM resource limits, e.g.: docker run -it --cpus 2 --memory 4g nginx Enable automatic horizontal scaling (Kubernetes HPA) based on CPU/memory metrics.

Prohibited 18 – Mixed‑environment deployments

Risk level: ★★★★☆

Case: Test code was inadvertently synced to production, contaminating data.

Solution:

Isolate networks per environment (VLAN 100 for dev, VLAN 200 for test, dedicated physical network for prod).

Use Terraform workspaces or separate state files to enforce environment boundaries.

Prohibited 19 – Missing monitoring system

Risk level: ★★★★☆

Case: An undetected memory leak caused a service crash.

Solution:

Deploy a full‑stack observability stack (Prometheus + Grafana).

Define critical alerts, for example:

- name: node_memory_MemAvailable_bytes
  thresholds:
    critical: 10%

Prohibited 20 – No emergency response plan

Risk level: ★★★★★

Case: A DDoS attack without a response plan crippled services for eight hours.

Solution:

Define a four‑level response workflow:

Level1: automatic CDN failover
Level2: enable cloud DDoS protection (e.g., AWS Shield)
Level3: traffic scrubbing via Arbor or similar
Level4: manual incident response

Conduct quarterly red‑team/blue‑team drills to validate the plan.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

devops incident response security best practices system-administration server operations

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.