Essential Linux Ops: 10 Hard‑Earned Rules for Safe Server Management
This article shares practical Linux operations guidelines—including thorough testing, cautious use of destructive commands, strict backup policies, security hardening, continuous monitoring, performance tuning, and a disciplined mindset—to help avoid costly incidents and maintain stable production environments.
1. Online Operation Practices
Testing is essential before making changes; the author recounts a failed SSH configuration change that locked them out of a server and a mistaken rsync command that deleted production data because the source and destination were swapped.
Always double‑check commands, especially destructive ones like rm -rf, to prevent accidental data loss.
Avoid simultaneous edits by multiple operators on the same server, as conflicting changes can obscure the true cause of problems.
Back up configuration files (e.g., *.conf) before modifying them; a missing backup can turn a simple mistake into a disaster.
2. Data Handling
Use rm -rf only when absolutely necessary and with full awareness of the target path.
Implement regular backups; the author cites examples where payment platforms performed full backups every two hours or every twenty minutes.
Prioritize stability over speed; avoid deploying untested software (e.g., new Nginx + PHP‑FPM stacks) in production.
Maintain strict confidentiality for all sensitive data to prevent leaks.
3. Security Measures
SSH hardening: change the default port, disable root login, use key‑based authentication with sudo rules, restrict access by IP, and employ host‑deny tools to block repeated attacks.
Enable a firewall with a default‑deny policy, only opening required service ports.
Apply the principle of least privilege: run services with non‑root users and limit permissions to the minimum necessary.
Deploy intrusion‑detection and log‑monitoring tools to watch critical files (e.g., /etc/passwd, /etc/my.cnf) and log directories for suspicious activity.
4. Routine Monitoring
System health monitoring: track CPU, memory, disk usage, network I/O, login activity, and changes to critical files.
Service monitoring: observe metrics for web servers, databases, load balancers, etc., to quickly spot performance bottlenecks.
Log monitoring: collect hardware, OS, and application error logs; while less useful during normal operation, they become vital when incidents occur.
5. Performance Tuning
Deeply understand the runtime mechanisms of software (e.g., why Nginx is faster than Apache) before adjusting parameters.
Follow a tuning framework: identify the bottleneck, analyze logs, adjust OS/hardware first, then move to database configuration as a last step.
Change only one parameter at a time to isolate its impact.
Conduct benchmark tests to verify that tuning yields real‑world performance gains and aligns with business requirements.
6. Ops Mindset
Stay calm under pressure; avoid hasty actions on critical data, especially during stressful moments.
Take responsibility for data integrity; regular backups are non‑negotiable.
Investigate root causes of failures rather than merely applying quick fixes.
Separate testing and production environments and avoid multitasking during critical operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
