Essential Ops Practices: Backup, Security, Monitoring, and Performance Tuning for Linux Servers
This guide outlines practical Linux operations standards, covering safe testing, backup habits, multi‑user coordination, security hardening, continuous monitoring, performance tuning methods, and the right mindset to avoid costly mistakes in production environments.
Online Operation Standards
1. Testing Before Execution
When learning Linux on virtual machines, it is easy to develop risky habits such as using snapshots to roll back changes. The author recounts a personal incident where switching from PuTTY to XShell without proper testing locked him out of a server, highlighting the importance of backing up configuration files like sshd_config before making changes.
2. Double‑Check Before Entering Commands
Commands like rm -rf /var can cause severe damage if executed hastily, especially on slow connections. A single mistake can lead to irreversible data loss, underscoring that operational accidents can happen to anyone.
3. Avoid Multiple People Editing Simultaneously
When several operators modify the same server concurrently, it becomes difficult to trace the true cause of an issue. The author describes a chaotic scenario where multiple team members changed configuration files, leading to confusion and duplicated effort.
4. Backup Before Modifying
Always back up configuration files (e.g., .conf) before editing.
Comment out original options and duplicate the file before making changes.
If a backup existed for the earlier rsync mistake, the data loss could have been avoided.
Regular backups prevent catastrophic data loss.
Data‑Related Practices
1. Use rm -rf with Extreme Caution
Accidental deletions of critical directories or databases can cause massive damage; always verify the target before executing.
2. Backup Is Paramount
The author’s current employer backs up a third‑party payment site every two hours and a loan platform every 20 minutes, illustrating industry‑level backup frequency.
3. Prioritize Stability Over Speed
Stability and availability are more important than raw performance. New software (e.g., Nginx + PHP‑FPM) should be thoroughly tested before production deployment.
4. Keep Data Confidential
Given the prevalence of data leaks and back‑door exploits, confidentiality measures are essential for any data‑handling system.
Security Measures
1. SSH Hardening
Change the default port.
Disable root login.
Use regular users with key authentication, sudo rules, IP restrictions, and user limits.
Deploy tools like HostDeny to block brute‑force attempts.
Audit /etc/passwd for unauthorized users.
2. Firewall Configuration
Enable the firewall in production and follow the principle of least privilege: drop all traffic by default and explicitly allow required service ports.
3. Fine‑Grained Permissions
Run services with non‑root users whenever possible and limit permissions to the minimum necessary.
4. Intrusion Detection and Log Monitoring
Use third‑party tools to monitor critical system and service configuration files for changes.
Centralize log collection (e.g., /var/log/secure, /etc/log/message) and set up alerts for abnormal activity.
Block repeated port scans by adding offending IPs to host.deny.
Daily Monitoring
1. System Health Monitoring
Track hardware utilization such as memory, disk, CPU, and network interfaces, as well as OS login activity and critical file changes.
2. Service Monitoring
Monitor web servers, databases, load balancers, and other applications to quickly detect performance bottlenecks.
3. Log Monitoring
Collect and analyze logs from hardware, OS, and applications to aid troubleshooting when issues arise.
Performance Tuning
1. Understand Underlying Mechanisms
Before tweaking parameters, study how software like Nginx or Apache processes requests, and be able to read relevant source code.
2. Follow a Tuning Framework
Identify bottlenecks, analyze logs, define a tuning direction, and address issues in order: hardware/OS first, then database, and finally application configuration.
3. Change One Parameter at a Time
Isolating each change prevents confusion about which adjustment produced the observed effect.
4. Conduct Benchmark Tests
Use realistic benchmark workloads to verify that tuning improves performance and remains stable under production‑like conditions.
Ops Mindset
1. Control Your Emotions
When under pressure (e.g., near the end of a shift), stay calm to avoid reckless commands that could delete critical data.
2. Take Responsibility for Data
Never treat production databases as a playground; lack of backups can lead to irreversible loss.
3. Investigate Root Causes
After fixing an issue, continue probing to ensure the underlying problem (e.g., memory shortage causing OOM kills) is fully resolved.
4. Separate Test and Production Environments
Always verify operations on a test machine before applying them to production, and avoid opening multiple terminal windows that can cause confusion.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
