Why Alertmanager Config Keeps Getting Overwritten in TiDB Clusters and How to Fix It
This guide explains why the Alertmanager configuration file in a TiDB cluster is repeatedly overwritten during reloads, analyzes error logs and TiUP documentation, and provides step‑by‑step instructions to edit the topology, set a custom config file, reload the service, and verify the fix.
Background
When configuring TiDB cluster alerts, the edited Alertmanager configuration file /data1/tidb-deploy/alertmanager-9093/conf/alertmanager.yml keeps being overwritten to its default content, causing loss of alerts after the TiDB cluster or Alertmanager component restarts.
Investigation Process
1. Check previous errors
During a reload command, the following error was observed:
Error: init config failed: serverIP:9093: transfer from /home/tidb/.tiup/storage/cluster/clusters/cluster-name/config-cache/alertmanager_serverIP.yml to /data1/tidb-deploy/alertmanager-9093/conf/alertmanager.yml failed: failed to scp /home/tidb/.tiup/storage/cluster/clusters/cluster-name/config-cache/alertmanager_serverIP.yml to tidb@serverIP:/data1/tidb-deploy/alertmanager-9093/conf/alertmanager.yml: Process exited with status 1The error shows that during reload the alertmanager.yml file is overwritten by alertmanager_serverIP.yml.
After editing alertmanager_serverIP.yml with custom alert rules and running reload again, both files reverted to the default content.
It was also discovered that /data1/tidb-deploy/alertmanager-9093/conf/alertmanager.yml had its permissions set to the root user; it should be owned by the tidb user.
2. Official documentation
TiUP deploys TiDB clusters together with Prometheus, Grafana, and Alertmanager. During cluster expansion, scaling, or reload operations, TiUP may automatically overwrite the monitoring components' configuration files with its own parameters.
To customize these components, you need to add the corresponding configuration items in the cluster topology file topology.yaml.
Solution
1. Edit alert configuration file
vim /data1/tidb-deploy/alertmanager-9093/conf/alertmanager_ziroom.yml2. Modify topology.yaml to add config_file
tiup cluster edit-config cluster-name
alertmanager_servers:
- host: serverIP
ssh_port: 22
web_port: 9093
cluster_port: 9094
deploy_dir: /data1/tidb-deploy/alertmanager-9093
data_dir: /data1/tidb-data/alertmanager-9093
log_dir: /data1/tidb-deploy/alertmanager-9093/log
arch: amd64
os: linux
config_file: /data1/tidb-deploy/alertmanager-9093/conf/alertmanager_ziroom.yml3. Reload Alertmanager nodes
tiup cluster reload cluster-name -R alertmanager4. Verify configuration files
-rw-r--r--. 1 tidb tidb 1.4K Dec 20 20:41 alertmanager.yml
-rw-rw-r--. 1 tidb tidb 1.4K Dec 20 20:20 alertmanager_ziroom.ymlBoth files now contain the custom configuration and are no longer overwritten; alerts are received correctly.
Related Notes
Which configuration file does Alertmanager actually use?
The config_file entry points to
/data1/tidb-deploy/alertmanager-9093/conf/alertmanager_ziroom.yml, but the Alertmanager service still reads /data1/tidb-deploy/alertmanager-9093/conf/alertmanager.yml, which cannot be manually changed in the run script.
How is alertmanager.yml generated?
During a reload, the following scp command copies the custom file to the actual configuration file:
scp /data1/tidb-deploy/alertmanager-9093/conf/alertmanager_ziroom.yml to tidb@serverIP:/data1/tidb-deploy/alertmanager-9093/conf/alertmanager.ymlThis explains why setting config_file in topology.yaml resolves the overwriting issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
