Essential Ops Lessons: Prevent Data Loss, Secure Servers, and Optimize Performance
Drawing from three and a half years of system administration, this article outlines practical guidelines for safe online operations, data protection, server security, continuous monitoring, performance tuning, and the right mindset to avoid costly mishaps in production environments.
1. Online Operation Guidelines
Test before using – The author recounts early mistakes such as switching from PuTTY to Xshell without testing, which locked him out of a server after restarting sshd; a backup of sshd_config saved the day.
Beware of rsync pitfalls – An inverted rsync command deleted production data because the source and destination were swapped, highlighting the need for thorough testing and backups before any file‑synchronisation operation.
Never skip backups – The consequences of missing backups are emphasized as a fundamental lesson for any operation.
2. Data Handling
Use rm -rf with extreme caution – Even a single typo can erase critical directories; always double‑check commands before execution.
Backup is paramount – Regular backups (e.g., every two hours for a payment platform, every 20 minutes for a loan platform) prevent irreversible data loss.
Stability over speed – Prefer proven, stable software in production; avoid deploying untested stacks such as a new nginx+php‑fpm combination without thorough validation.
Confidentiality matters – With increasing incidents of data leaks and back‑door exploits, protecting sensitive data is non‑negotiable.
3. Security Practices
SSH hardening – Change the default port, disable root login, enforce key‑based authentication with sudo rules, restrict access by IP, and employ tools like hostdeny to block brute‑force attempts.
Firewall configuration – Enable a firewall in production, adopt a default‑deny policy, and explicitly allow only required service ports.
Fine‑grained permissions – Run services with the least privileged non‑root accounts and limit permission scopes as tightly as possible.
Intrusion detection and log monitoring – Deploy third‑party agents to watch critical files (e.g., /etc/passwd, /etc/my.cnf) and centralise logs (e.g., /var/log/secure, /etc/log/message) for real‑time alerts.
4. Daily Monitoring
System health monitoring – Track CPU, memory, disk usage, network traffic, and login activity to anticipate hardware failures.
Service health monitoring – Observe key metrics of web servers, databases, load balancers, etc., to quickly spot performance bottlenecks.
Log monitoring – Continuously collect OS and application error logs; while seemingly idle during stable periods, logs become vital when incidents occur.
5. Performance Tuning
Understand underlying mechanisms – Knowing why nginx outperforms Apache (event‑driven architecture, async I/O) and being able to read source code prevents blind parameter changes.
Tuning framework and order – Identify the bottleneck first (via logs and metrics), then address OS/hardware issues before tweaking database configurations; database tuning should be the last step.
Change one parameter at a time – Isolating each adjustment avoids confusion about which change produced an effect.
Benchmarking – Run baseline tests to verify improvements; reference materials such as the third edition of “High Performance MySQL” are recommended.
6. Ops Mindset
Control emotions – Stay calm during critical failures (e.g., accidental rm -rf /data) to prevent rash decisions.
Take responsibility for data – Treat production data with the seriousness of a mission‑critical asset; always have a recovery plan.
Root‑cause analysis – Investigate recurring issues thoroughly; an example given is MySQL OOM kills caused by insufficient memory and missing swap.
Separate test and production environments – Verify operations on the correct machine, limit the number of open terminals, and avoid performing risky actions during high‑stress periods.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
