Rebuilding an OceanBase Node Using the server_permanent_offline_time Parameter
This guide explains how to use the OceanBase server_permanent_offline_time parameter to permanently offline a faulty node, rebuild its data, and restore normal operation, including preparation, command steps, verification, and recommended settings for production.
The article describes a practical method for recovering a damaged or missing data file in an OceanBase cluster by adjusting the server_permanent_offline_time parameter, which controls how long a node must be offline before being marked permanently offline.
Principle
server_permanent_offline_time determines the timeout after which a crashed node is considered permanently offline. If the downtime exceeds the configured value, the node is removed from the Paxos replica group and its data is rebuilt on other nodes in the same zone. The default is 3600 seconds; lowering it accelerates permanent offline and subsequent data reconstruction.
Official Recommendations
Database version upgrade: set to 72 hours.
OBServer hardware replacement: set to 4 hours.
OBServer clean‑up scenario: set to 10 minutes.
Preparation
Deploy a three‑node OceanBase cluster with an OBProxy, create a tenant sysbench_tenant (primary_zone=RANDOM), and note the IPs:
oceanbase 3.1.2 10.186.64.74
10.186.64.75
10.186.64.79
OBProxy 3.2.3 10.186.60.3Generate test data using sysbench:
sysbench ./oltp_insert.lua --mysql-host=10.186.60.3 --mysql-port=2883 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant" --mysql-password=sysbench --tables=1 --table_size=10000 --threads=1 --time=600 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062,5157,4038 prepareExperiment Steps
Continuously write data with sysbench to keep traffic.
Delete the data files on node 10.186.64.79 (zone3).
Reduce server_permanent_offline_time to 60 seconds:
Stop the node’s external service (ISOLATE or STOP SERVER):
Kill the observer process and wait for the permanent‑offline state, verifying via __all_rootservice_event_history .
Restart the observer process after clearing logs and sstable files, which triggers automatic data reconstruction.
When partition counts across zones become equal, start the node again:
Restore the original server_permanent_offline_time value (3600 seconds).
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.