10 Essential Practices to Prevent DBA and Ops Disasters
Learn ten practical strategies—from safe change rollbacks and cautious destructive commands to robust backups, clear prompts, vigilant monitoring, and disciplined handovers—that help DBAs and operations engineers avoid costly system failures and maintain reliable production environments.
1. Ensure changes are rollbackable and tested in identical environments
Operations is an experiential discipline; protect the environment and allow rollbacks.
2. Be extremely cautious with destructive operations
Examples of destructive commands include Oracle's
truncate table_name,
delete table_name,
drop table_name, and Linux's
rm -r. Use aliases like
alias rm='rm -i',
alias cp='cp -i',
alias mv='mv -i'to add prompts.
3. Set informative command prompts
Before executing, know whether you are on primary or standby, current directory, schema, session, time, etc. For Oracle you can set
set sqlprompt 'RAC-node1-primary@10g>>'. For Linux bash, customize PS1.
4. Backup and verify backup integrity
Backups are essential; distinguish cold vs hot, real‑time vs non‑real‑time, physical vs logical. Real‑time hot backups are needed for 24/7 OLTP, but also keep non‑real‑time backups to recover from logical errors. Always test backups by restoring to an empty database.
5. Maintain reverence for production environments
Adopt professional ethics similar to accountants; regularly run health checks (e.g., Oracle RDA, Linux password aging, network isolation).
6. Handle handovers and vacations carefully
When taking over work, repeatedly confirm change plans, document procedures, and verify details with the original operator before executing.
7. Build alerting and performance monitoring
Alerts notify you of anomalies promptly; monitoring provides historical performance data for trend analysis and proactive optimization.
8. Automations must be applied cautiously
Automatic failover solutions like Oracle Data Guard can cause data loss if a switchover occurs before replication completes.
9. Be meticulous and double‑check everything
Follow a disciplined checklist: notify stakeholders weeks in advance, review scripts on test machines, copy to production, verify steps, confirm with team, and monitor execution.
10. Simplicity is the ultimate sophistication
Prefer built‑in system commands and simple scripts over complex third‑party tools; the native Linux CLI often provides the most efficient solution.
alias cp='cp -i'
alias mv='mv -i'Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.