Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring
This guide shares practical Linux operations lessons—ranging from cautious command use, rigorous backup habits, and secure SSH configurations to comprehensive monitoring and performance tuning—to help teams avoid costly mistakes and maintain stable, reliable services.
1. Online Operation Guidelines
1. Testing Use
When learning Linux on virtual machines, it is easy to develop risky habits that cause trouble when gaining real server access. For example, changing the SSH daemon configuration without testing can lock you out of the server.
Another example is misuse of
rsyncfor file synchronization, which can unintentionally delete source data if the source and destination are reversed.
2. Double‑Check Before Enter
Commands like
rm -rf /varcan cause severe damage if executed accidentally, especially when working quickly or with slow network connections.
When you realize the command has run, your heart will drop at least half a beat.
One mistake is enough to teach you to be cautious; these incidents can happen to anyone.
3. Avoid Multiple Operators
When many people share root passwords, concurrent changes can lead to conflicting configurations and wasted troubleshooting time.
4. Backup Before Changing
Always back up configuration files (e.g.,
.conf) before editing, and comment out original options before modifying them.
2. Data‑Related Practices
1. Use rm -rf Carefully
A small mistake with destructive commands can cause massive data loss; verify any deletion thoroughly.
2. Backup Is Paramount
Regular backups are essential. In one company, third‑party payment systems are backed up every two hours, while a loan platform backs up every 20 minutes.
3. Stability Over Speed
Prioritize a stable, reliable environment over the fastest setup; avoid deploying untested software in production.
4. Confidentiality Is Critical
Given frequent data leaks, protecting sensitive data with proper confidentiality measures is mandatory.
3. Security Measures
1. SSH Hardening
Change the default port.
Disable root login.
Use regular users with key authentication, sudo rules, IP restrictions.
Deploy tools like
hostdenyto block brute‑force attempts.
Audit users listed in
/etc/passwd.
2. Firewall
Enable firewalls in production and follow the principle of least privilege: drop all traffic by default and allow only required ports.
3. Fine‑Grained Permissions
Run services with non‑root users whenever possible and limit permissions to the minimum necessary.
4. Intrusion Detection and Log Monitoring
Use third‑party tools to monitor critical system and service configuration files for changes.
Centralize log monitoring for files such as
/var/log/secureand
/etc/log/message.
Detect port scans and block offending IPs via
host.deny.
4. Daily Monitoring
1. System Health Monitoring
Track hardware usage (CPU, memory, disk, network) and OS metrics, including login activity and critical file changes.
2. Service Monitoring
Monitor web, database, and load‑balancer services to quickly identify performance bottlenecks.
3. Log Monitoring
Collect and analyze logs from hardware, OS, and applications to detect issues early.
5. Performance Tuning
1. Understand Underlying Mechanisms
Before tuning, study how software (e.g., Nginx vs. Apache) works internally; otherwise, tuning becomes guesswork.
2. Tuning Framework and Order
Identify bottlenecks, analyze logs, and plan tuning steps; prioritize hardware and OS before database configuration.
3. Change One Parameter at a Time
Isolate the impact of each change to avoid confusion.
4. Benchmark Testing
Use benchmark tests to verify the effectiveness of tuning and to assess new software versions.
6. Ops Mindset
1. Control Your Mood
Avoid making critical changes when stressed; if possible, defer risky operations.
2. Take Responsibility for Data
Never treat production data lightly; lack of backups leads to severe consequences.
3. Investigate Root Causes
When issues recur, dig deeper to find underlying problems such as memory shortages or software bugs.
4. Test Before Production
Always verify changes in a controlled environment before applying them to live systems.
Open Source Linux
Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.