Operations 9 min read

Why Ops Must Respect Data: Essential Backup and Release Practices

The article emphasizes that operations teams must treat data with reverence, outlining comprehensive backup strategies, routine file‑system maintenance, database and big‑data safeguards, disciplined release processes, and meticulous change‑management practices to mitigate risks and ensure system stability.

Efficient Ops
Efficient Ops
Efficient Ops
Why Ops Must Respect Data: Essential Backup and Release Practices

Brief Overview

“Respect for data” was raised in a morning meeting and resonated deeply; recent incidents of data loss and service outages remind us that ops must anticipate risks and protect data proactively.

Definition of Data

From an ops perspective, data is not an isolated entity; it exists throughout daily processes such as routine maintenance, changes, and incident handling. Therefore, analysis should consider all stages where data resides.

We broadly categorize data as:

Data backup

File system + routine maintenance

Databases

Big data

Business version releases

Requirement changes

Data Backup

Data backup is the first line of defense. Depending on storage capacity and recovery speed, backups can be local or off‑site (excluding multi‑datacenter disaster recovery). Backup retention periods must be defined to avoid uncontrolled growth in storage costs. The content to be backed up includes:

System‑level configuration files : kernel parameters, hosts resolution, crontab jobs, environment variables, firewall settings, etc.

Application‑level configuration files : Nginx, Java applications, middleware, DNS, etc.

Log data : application logs, Nginx logs, etc.

Database backups : binlog, logical dumps, configuration files, slow‑query logs.

File System + Routine Maintenance

Routine maintenance such as disk cleaning and file handling carries high risk of data loss; operators must stay focused and use

secondary confirmation

when executing dangerous commands.

Here is a useful Linux tip to prevent accidental

rm

deletions:

<code>【运维小贴士:巧用Linux冒号命令,实现rm防误删】
Linux系统中冒号(:)在bash中是一个內建命令,而不单纯是一个分隔符,它的主要作用是空命令、参数扩展、重定向、注释等。
我们可以使用其参数扩展特性实现rm的防误删功能,下面我们来通过实例讲解下其用法。
格式:${parameter:-test}
功能:如果parameter没有设置或者为空,替换为test;否则替换为parameter的值。
命令:rm -rf ${dest:-test}
用法:当变量dest为空时,删除test;当变量dest不为空时,删除test
用例:rm -rf /$dest。当变量dest没有设置或为空时,则命令变成rm -rf /,这将误删系统根目录,导致系统崩溃。
改进:rm -rf /${dest:-test},当变量dest没有设置或为空时,会使用test代替,则命令变成rm -rf /test,删除此目录不会产生任何影响。</code>

Database + Big Data

Both databases and big‑data platforms hold core data. Command filtering via bastion hosts can block dangerous operations such as

drop

,

truncate

,

delete

for databases and

hdfs dfs -rm

for big data.

Standardized processes and tools should be used; for databases, tools like Archery can audit SQL, while big‑data ecosystems may lack a single management tool, but regular backups remain essential.

Business Version Release

Version releases are high‑pressure events with many failure points, including chaotic configuration files, environment contamination, poor Git branch management, ad‑hoc releases, and insufficient testing.

Effective DevOps practices require:

Strict code management with feature‑branch isolation.

Environment‑specific configuration files.

Physical or logical isolation of test and production environments.

Comprehensive functional testing before release.

Designated release dates with no changes on non‑release days.

Standardized, parameterized automated release pipelines.

Alert suppression or handling during release for finer monitoring.

Requirement Changes

Small changes can cause production incidents. A pre‑change plan should include:

Identify the business scope affected by the change.

Notify responsible parties and define key change milestones.

Specify the change procedure and steps.

Prepare data backup and recovery plans.

Schedule the change outside peak business hours.

Conclusion

Data is ubiquitous, and data risks follow it everywhere; therefore, ops must maintain a reverent attitude toward data regardless of experience level. Continuous vigilance is essential for data security.

Source: originally published on the “Great Ops” public account.

Risk ManagementoperationsDevOpsSystem Administrationdata backuprelease management
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.