Operations 12 min read

Essential Ops Lessons: Avoid Disasters with Backups, Monitoring, and Secure Practices

This guide shares hard‑earned lessons from real‑world server administration, emphasizing careful testing, confirming commands before execution, limiting simultaneous operators, always backing up configurations, protecting data, tightening SSH and firewall security, implementing comprehensive monitoring, and applying disciplined performance‑tuning practices to maintain stable, reliable services.

Senior Brother's Insights
Senior Brother's Insights
Senior Brother's Insights
Essential Ops Lessons: Avoid Disasters with Backups, Monitoring, and Secure Practices

1. Testing Use

When learning Linux, the author moved from virtual machines to a real server, eagerly trying XShell after receiving the root password. Changing the SSH configuration without testing locked the author out, requiring a restored sshd_config from backup.

A second incident involved rsync: the source and destination were reversed, causing rapid deletion of production data with no backup, illustrating the severe impact of a simple mistake.

2. Confirm Before Execution

Commands like rm -rf /var are easy to mistype, especially under pressure or slow network conditions. Experiencing such a mistake once makes the risk clear; it can happen to anyone.

3. Avoid Multiple Operators

In a chaotic environment where several departing operators share the root password, multiple people often debug the same server simultaneously. Conflicting changes make it hard to pinpoint the true cause of an issue.

4. Backup Before Changes

Always back up configuration files (e.g., .conf) before editing. Comment out original options, copy them, then modify. If a database backup existed, the rsync error would have been less damaging.

Data Handling

1. Use rm -rf Carefully

Many online examples show catastrophic deletions. If deletion is truly required, proceed with extreme caution.

2. Backup Is Paramount

Frequent backups are essential. The author’s former company, handling third‑party payments and loan platforms, backed up payment data every two hours and loan data every twenty minutes.

3. Stability Over Speed

Prioritize stability rather than raw speed. Avoid deploying untested software (e.g., Nginx + PHP‑FPM) in production, as PHP crashes can be mitigated by switching to Apache.

4. Confidentiality Is Critical

Data leaks and router backdoors underscore the necessity of keeping sensitive data confidential.

Security Practices

1. SSH Hardening

Change the default port.

Disable root login.

Use a normal user with key authentication, sudo rules, IP restrictions, and user limits.

Deploy host‑deny or similar tools to block brute‑force attempts.

Filter login users in /etc/passwd.

2. Firewall

Enable the firewall in production and follow the principle of least privilege: drop all traffic by default, then allow only required service ports.

3. Fine‑grained Permissions

Run services as non‑root whenever possible and restrict permissions to the minimum necessary.

4. Intrusion Detection & Log Monitoring

Use third‑party tools to monitor critical files (e.g., /etc/passwd, /etc/my.cnf, /etc/httpd/conf/httpd.conf) and centralize log monitoring for /var/log/secure, system messages, FTP activity, and port scans. Detecting scans can trigger host‑deny rules. Proper logging greatly aids post‑incident analysis.

Daily Monitoring

1. System Monitoring

Monitor hardware usage—memory, disk, CPU, NIC—and OS login activity and critical file changes. Regular monitoring predicts hardware failures and supports performance tuning.

2. Service Monitoring

Track metrics for web servers, databases, load balancers, etc., to quickly identify performance bottlenecks.

3. Log Monitoring

Similar to security logs but focuses on hardware, OS, and application error messages. Essential when issues arise.

Performance Tuning

1. Understand Runtime Mechanisms

Deeply understand how software such as Nginx and Apache operate. Without this knowledge, tuning is merely guesswork.

2. Tuning Framework & Order

Analyze bottlenecks, review logs, define a tuning direction, then act. Prioritize hardware and OS adjustments before tweaking database configurations.

3. Change One Parameter at a Time

Modify a single setting per test to avoid confusion.

4. Benchmark Testing

Validate the effectiveness of tuning and the stability of new software versions with thorough benchmark tests. Refer to “High Performance MySQL” for methodology.

Ops Mindset

1. Control Emotions

Avoid frantic commands near shift end; stay calm to prevent costly mistakes.

2. Responsibility for Data

Production data is not a toy; lack of backup leads to severe consequences.

3. Root‑Cause Investigation

Do not ignore recurring issues. Investigate underlying causes such as MyISAM bugs, MySQL bugs, OOM kills, or insufficient memory. In one case, upgrading physical memory resolved an OOM‑induced MySQL crash.

4. Test Before Production

Verify the target machine and limit open windows before critical operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsperformance tuningBackupSystem Administration
Senior Brother's Insights
Written by

Senior Brother's Insights

A public account focused on workplace, career growth, team management, and self-improvement. The author is the writer of books including 'SpringBoot Technology Insider' and 'Drools 8 Rule Engine: Core Technology and Practice'.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.