Operations 7 min read

Essential DBA & Ops Practices to Prevent System Failures

This article outlines ten practical guidelines for DBAs and system administrators—including rollback‑ready changes, cautious use of destructive commands, prompt customization, reliable backups, production respect, thorough handovers, alerting, monitoring, careful failover, meticulous checks, and the virtue of simplicity—to minimize costly system outages.

Efficient Ops

Jan 10, 2019

Essential DBA & Ops Practices to Prevent System Failures

1. Ensure changes are rollback‑able and tested in an identical environment

Operations are a discipline of experience and trial‑and‑error; protect the production site so that every change can be reverted if needed.

2. Treat destructive operations with extreme caution

Examples for Oracle: truncate table_name 、 delete table_name 、 drop table_name – easy to run but costly even if rollback is possible.

Examples for Linux: rm -r deletes all files in the current and sub‑directories. Many users alias it to prevent accidents:

alias rm='rm -i'

alias cp='cp -i'
alias mv='mv -i'

3. Configure informative command prompts

Before executing commands, know whether you are on the primary or standby, the current directory, schema, session, and time.

Oracle example:

set sqlprompt 'RAC-node1-primary@10g>>'
RAC-node1-primary@10g>>

For Linux, customize PS1 to display host, user, and directory.

4. Backup and verify backup integrity

Backups are essential; they can be classified as cold/hot, real‑time/non‑real‑time, physical/logical. Even with real‑time hot backups, you still need non‑real‑time backups to recover from logical errors such as accidental DELETE statements.

Always validate backups by restoring them to an empty database.

5. Treat production environments with reverence

Adopt professional ethics similar to accountants. Run health checks (e.g., Oracle RDA inspections, Linux password aging policies, network isolation).

6. Handover and vacation periods are high‑risk

When taking over work, repeatedly confirm change plans and document procedures before leaving. Prepare detailed handover documents specifying actions and contacts.

7. Build alerting and performance monitoring

Alerting lets you know about anomalies instantly; monitoring provides historical performance data for trend analysis and optimization.

8. Use automatic failover cautiously

In Oracle Data Guard, a switchover that does not replicate a transaction can cause lost orders and revenue.

9. Be meticulous and double‑check everything

Notify stakeholders weeks in advance via email and phone.

Write scripts on a test machine and conduct a peer review.

Copy scripts to production after testing.

Record the exact sequence of commands.

Confirm with all parties the steps, timing, impact, and rollback plan.

Log out, then log back in before running the script.

Execute the script while monitoring output from another terminal.

10. Simplicity is the ultimate sophistication

Resist the temptation to adopt new architectures, tools, or hardware unless they are truly needed in production. Prefer built‑in Linux commands over complex third‑party software; simple text‑based tools are often more reliable.

Wishing all operations professionals smooth, fault‑free work.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Linux Oracle System Administration rollback

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.