Operations 12 min read

Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

This guide shares practical Linux operations lessons—ranging from cautious command use, rigorous backup habits, and secure SSH configurations to comprehensive monitoring and performance tuning—to help teams avoid costly mistakes and maintain stable, reliable services.

Open Source Linux
Open Source Linux
Open Source Linux
Essential Ops Practices: Prevent Disasters with Backups, Security, and Monitoring

1. Online Operation Guidelines

1. Testing Use

When learning Linux on virtual machines, it is easy to develop risky habits that cause trouble when gaining real server access. For example, changing the SSH daemon configuration without testing can lock you out of the server.

Another example is misuse of

rsync

for file synchronization, which can unintentionally delete source data if the source and destination are reversed.

2. Double‑Check Before Enter

Commands like

rm -rf /var

can cause severe damage if executed accidentally, especially when working quickly or with slow network connections.

When you realize the command has run, your heart will drop at least half a beat.

One mistake is enough to teach you to be cautious; these incidents can happen to anyone.

3. Avoid Multiple Operators

When many people share root passwords, concurrent changes can lead to conflicting configurations and wasted troubleshooting time.

4. Backup Before Changing

Always back up configuration files (e.g.,

.conf

) before editing, and comment out original options before modifying them.

2. Data‑Related Practices

1. Use rm -rf Carefully

A small mistake with destructive commands can cause massive data loss; verify any deletion thoroughly.

2. Backup Is Paramount

Regular backups are essential. In one company, third‑party payment systems are backed up every two hours, while a loan platform backs up every 20 minutes.

3. Stability Over Speed

Prioritize a stable, reliable environment over the fastest setup; avoid deploying untested software in production.

4. Confidentiality Is Critical

Given frequent data leaks, protecting sensitive data with proper confidentiality measures is mandatory.

3. Security Measures

1. SSH Hardening

Change the default port.

Disable root login.

Use regular users with key authentication, sudo rules, IP restrictions.

Deploy tools like

hostdeny

to block brute‑force attempts.

Audit users listed in

/etc/passwd

.

2. Firewall

Enable firewalls in production and follow the principle of least privilege: drop all traffic by default and allow only required ports.

3. Fine‑Grained Permissions

Run services with non‑root users whenever possible and limit permissions to the minimum necessary.

4. Intrusion Detection and Log Monitoring

Use third‑party tools to monitor critical system and service configuration files for changes.

Centralize log monitoring for files such as

/var/log/secure

and

/etc/log/message

.

Detect port scans and block offending IPs via

host.deny

.

4. Daily Monitoring

1. System Health Monitoring

Track hardware usage (CPU, memory, disk, network) and OS metrics, including login activity and critical file changes.

2. Service Monitoring

Monitor web, database, and load‑balancer services to quickly identify performance bottlenecks.

3. Log Monitoring

Collect and analyze logs from hardware, OS, and applications to detect issues early.

5. Performance Tuning

1. Understand Underlying Mechanisms

Before tuning, study how software (e.g., Nginx vs. Apache) works internally; otherwise, tuning becomes guesswork.

2. Tuning Framework and Order

Identify bottlenecks, analyze logs, and plan tuning steps; prioritize hardware and OS before database configuration.

3. Change One Parameter at a Time

Isolate the impact of each change to avoid confusion.

4. Benchmark Testing

Use benchmark tests to verify the effectiveness of tuning and to assess new software versions.

6. Ops Mindset

1. Control Your Mood

Avoid making critical changes when stressed; if possible, defer risky operations.

2. Take Responsibility for Data

Never treat production data lightly; lack of backups leads to severe consequences.

3. Investigate Root Causes

When issues recur, dig deeper to find underlying problems such as memory shortages or software bugs.

4. Test Before Production

Always verify changes in a controlled environment before applying them to live systems.

monitoringOperationsPerformance TuningLinuxsecurityBackup
Open Source Linux
Written by

Open Source Linux

Focused on sharing Linux/Unix content, covering fundamentals, system development, network programming, automation/operations, cloud computing, and related professional knowledge.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.