Operations 13 min read

Essential Ops Playbook: Avoid Costly Mistakes in Server Management

This guide shares practical Linux server operation rules, emphasizing thorough testing, careful use of destructive commands, strict access control, regular backups, security hardening, continuous monitoring, and disciplined performance tuning to prevent costly outages and data loss.

Open Source Linux

Mar 19, 2020

Essential Ops Playbook: Avoid Costly Mistakes in Server Management

Online Operation Standards

1. Test Usage When learning Linux, many start on virtual machines, but the habit of experimenting without proper testing can lead to serious issues once you gain root access on real servers.

On my first day at work I switched from PuTTY to Xshell and changed the SSH configuration without testing, which locked me out of the server until the original sshd_config was restored.

Another example: using rsync for synchronization can unintentionally delete source data if the source and destination are reversed, resulting in loss of production data.

2. Confirm Before Pressing Enter Commands like rm -rf /var can easily be mistyped, especially when working quickly or under slow network conditions.

When you realize the command has executed, your heart will at least be half‑frozen.

Even if you have never made a mistake, a single slip can cause a disaster; never assume that operational incidents happen only to others.

3. Avoid Multiple Operators In a chaotic environment where many people know the root password, simultaneous changes can overwrite each other's work, making troubleshooting extremely frustrating.

4. Backup Before Changing Always back up configuration files (e.g., .conf) before editing. Comment out original options, then copy and modify them. Regular database backups would have prevented the rsync mishap.

Data‑Related Guidelines

1. Use rm -rf with Extreme Caution Many online examples show disastrous deletions; a tiny mistake can cause massive loss.

2. Backup Is Paramount In my previous company, third‑party payment services were backed up every two hours, while a loan platform backed up every 20 minutes.

3. Stability Over Speed Prioritize stability and availability over raw performance; avoid deploying untested software in production.

4. Confidentiality Is Critical With frequent data leaks, protecting sensitive data is non‑negotiable.

Security Practices

1. SSH Hardening

Change the default port.

Disable root login.

Use regular user + key authentication + sudo rules + IP restrictions.

Deploy brute‑force protection tools (e.g., hostdeny).

Audit /etc/passwd for valid login users.

2. Firewall Enable the firewall in production and follow the principle of least privilege: drop all traffic by default and allow only necessary ports.

3. Fine‑Grained Permissions Run services with the least privileged user; never run them as root.

4. Intrusion Detection and Log Monitoring

Use third‑party tools to monitor critical system and service configuration files for changes.

Centralize log monitoring for /var/log/secure, /etc/log/message, FTP activity, etc.

Detect port scans and block offending IPs via host.deny.

Effective security starts with solid fundamentals; once basics are covered, advanced measures become easier to implement.

Daily Monitoring

1. System Monitoring Track hardware utilization (CPU, memory, disk, network) and OS metrics such as login activity and critical file changes.

2. Service Monitoring Monitor web, database, load balancer, and other application metrics to quickly detect performance bottlenecks.

3. Log Monitoring Observe hardware, OS, and application error logs; without monitoring, issues become reactive rather than proactive.

Performance Tuning

1. Understand Underlying Mechanisms Before tuning, grasp how software (e.g., Nginx vs. Apache) works internally; otherwise, tuning is guesswork.

2. Tuning Framework and Order Identify bottlenecks via logs, define a tuning direction, and address hardware/OS before database configuration.

3. Change One Parameter at a Time Isolating changes prevents confusion.

4. Benchmark Testing Use benchmarks to verify the impact of changes and to assess new software versions.

Operational Mindset

1. Control Your Emotions Avoid making critical changes when stressed; if possible, defer to a calmer time.

2. Take Responsibility for Data Production data is not a toy; always ensure backups exist.

3. Investigate Root Causes When recurring issues arise, dig deeper rather than applying quick fixes.

4. Test Before Production Verify operations on test machines and avoid opening multiple terminals for critical tasks.

Source: http://www.cnblogs.com/yihr/p/9593795.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring performance tuning Backup server operations

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.