Essential Ops Checklist: From Safe Commands to Performance Tuning
This article shares practical operations guidelines covering safe command usage, backup strategies, security hardening, daily monitoring, performance tuning, and the right mindset to prevent data loss and ensure stable, secure Linux server management.
1. Online Operation Standards
Testing usage The author recounts early Linux learning on VMs, the temptation to test on real servers, and a mishap when changing SSH login method without proper testing, which was resolved by restoring a backed‑up sshd_config file.
rm -rf pitfalls An rsync mistake caused accidental deletion of source data because the source and destination directories were reversed, highlighting the importance of testing and backups.
Confirm before Enter The danger of careless
rm -rf /varcommands is emphasized; a single slip can cause severe damage.
Avoid multiple operators When many people edit the same server configuration, conflicting changes create confusion and delay problem resolution.
Backup before changes Always back up files such as configuration files before modifying them; having database backups would have prevented data loss in the rsync incident.
2. Data Handling
Use rm -rf carefully Public examples of disastrous
rm -rf /commands illustrate the need for caution.
Backup is paramount The author cites third‑party payment and loan platforms that perform full backups every two hours or twenty minutes, stressing that frequent backups are essential.
Stability over speed In production, prioritize stability and availability over the newest software; untested stacks like nginx+php‑fpm can cause frequent crashes.
Confidentiality With widespread data leaks, protecting sensitive data is non‑negotiable.
3. Security Practices
SSH hardening
Change the default port (recognizing that determined attackers can still scan).
Disable root login.
Use normal user + key authentication + sudo rules + IP restrictions.
Deploy host‑deny‑like intrusion‑prevention tools that block repeated login attempts.
Filter login users listed in
/etc/passwd.
Firewall Enable a firewall in production, apply a default‑deny policy, and explicitly allow required service ports.
Fine‑grained permissions Run services with the least privileged user instead of root.
Intrusion detection & log monitoring Use third‑party tools to watch critical files (e.g.,
/etc/passwd,
/etc/my.cnf) and centralize log monitoring for security‑related logs; such monitoring aids post‑incident analysis.
4. Daily Monitoring
System monitoring Track hardware utilization (CPU, memory, disk, network) and OS events such as logins and critical file changes.
Service monitoring Monitor web, database, and load‑balancer metrics to quickly detect performance bottlenecks.
Log monitoring Collect hardware, OS, and application error logs; while less useful during normal operation, they become vital when issues arise.
5. Performance Tuning
Understand runtime mechanisms Before tuning, grasp how software works (e.g., why nginx is faster than Apache) and be able to read source code if needed.
Tuning framework and order Analyze bottlenecks, review logs, and plan tuning steps; prioritize hardware and OS adjustments before tweaking database configurations.
Change one parameter at a time Adjust a single setting per iteration to avoid confusion.
Benchmark testing Conduct baseline tests to verify that tuning improves performance and stability; reference works like "High Performance MySQL" for methodology.
6. Ops Mindset
Control your mindset Stay calm under pressure; avoid rash actions like deleting critical data during stressful moments.
Be responsible for data Never skip backups; data loss can be catastrophic.
Root‑cause analysis Investigate underlying issues such as OOM kills that caused MySQL crashes, rather than merely applying quick fixes.
Separate test and production Verify changes on appropriate machines and avoid opening multiple windows during critical operations.
Original link: https://zhuanlan.zhihu.com/p/365519427
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.