Operations 12 min read

Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

This guide shares practical Linux operations lessons—ranging from cautious command use, rigorous backup habits, and secure SSH configurations to comprehensive monitoring and performance tuning—to help teams avoid costly mistakes and maintain stable, reliable services.

Open Source Linux

Oct 19, 2021

Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

1. Online Operation Guidelines

1. Testing Use

When learning Linux on virtual machines, it is easy to develop risky habits that cause trouble when gaining real server access. For example, changing the SSH daemon configuration without testing can lock you out of the server.

Another example is misuse of rsync for file synchronization, which can unintentionally delete source data if the source and destination are reversed.

2. Double‑Check Before Enter

Commands like rm -rf /var can cause severe damage if executed accidentally, especially when working quickly or with slow network connections.

When you realize the command has run, your heart will drop at least half a beat.

One mistake is enough to teach you to be cautious; these incidents can happen to anyone.

3. Avoid Multiple Operators

When many people share root passwords, concurrent changes can lead to conflicting configurations and wasted troubleshooting time.

4. Backup Before Changing

Always back up configuration files (e.g., .conf) before editing, and comment out original options before modifying them.

2. Data‑Related Practices

1. Use rm -rf Carefully

A small mistake with destructive commands can cause massive data loss; verify any deletion thoroughly.

2. Backup Is Paramount

Regular backups are essential. In one company, third‑party payment systems are backed up every two hours, while a loan platform backs up every 20 minutes.

3. Stability Over Speed

Prioritize a stable, reliable environment over the fastest setup; avoid deploying untested software in production.

4. Confidentiality Is Critical

Given frequent data leaks, protecting sensitive data with proper confidentiality measures is mandatory.

3. Security Measures

1. SSH Hardening

Change the default port.

Disable root login.

Use regular users with key authentication, sudo rules, IP restrictions.

Deploy tools like hostdeny to block brute‑force attempts.

Audit users listed in /etc/passwd.

2. Firewall

Enable firewalls in production and follow the principle of least privilege: drop all traffic by default and allow only required ports.

3. Fine‑Grained Permissions

Run services with non‑root users whenever possible and limit permissions to the minimum necessary.

4. Intrusion Detection and Log Monitoring

Use third‑party tools to monitor critical system and service configuration files for changes.

Centralize log monitoring for files such as /var/log/secure and /etc/log/message.

Detect port scans and block offending IPs via host.deny.

4. Daily Monitoring

1. System Health Monitoring

Track hardware usage (CPU, memory, disk, network) and OS metrics, including login activity and critical file changes.

2. Service Monitoring

Monitor web, database, and load‑balancer services to quickly identify performance bottlenecks.

3. Log Monitoring

Collect and analyze logs from hardware, OS, and applications to detect issues early.

5. Performance Tuning

1. Understand Underlying Mechanisms

Before tuning, study how software (e.g., Nginx vs. Apache) works internally; otherwise, tuning becomes guesswork.

2. Tuning Framework and Order

Identify bottlenecks, analyze logs, and plan tuning steps; prioritize hardware and OS before database configuration.

3. Change One Parameter at a Time

Isolate the impact of each change to avoid confusion.

4. Benchmark Testing

Use benchmark tests to verify the effectiveness of tuning and to assess new software versions.

6. Ops Mindset

1. Control Your Mood

Avoid making critical changes when stressed; if possible, defer risky operations.

2. Take Responsibility for Data

Never treat production data lightly; lack of backups leads to severe consequences.

3. Investigate Root Causes

When issues recur, dig deeper to find underlying problems such as memory shortages or software bugs.

4. Test Before Production

Always verify changes in a controlled environment before applying them to live systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Operations performance tuning security Backup

Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.