Databases 11 min read

How to Perform Daily Maintenance on GaussDB T Clusters Without Pitfalls

This guide walks you through the essential daily maintenance tasks for GaussDB T clusters, covering ETCD startup, cluster health checks, host resource monitoring, tablespace usage, abnormal wait events, log inspection, and common error troubleshooting with concrete commands and SQL examples.

dbaplus Community
dbaplus Community
dbaplus Community
How to Perform Daily Maintenance on GaussDB T Clusters Without Pitfalls

1. Start ETCD and Bring Up the Cluster

After powering on the virtual machines, ensure the ETCD service is started before launching the rest of the cluster. Once ETCD is up, start all GaussDB T processes; the cluster should come online successfully.

2. Cluster Status Check

Verify that each node (CM, CN, DN, ETCD) is online. Compare the current node list with the previous day to detect any unexpected switches. Use the provided screenshots to identify offline nodes and investigate their causes.

3. Host Resource Usage (All Hosts)

Directory usage: run df -h to view filesystem utilization.

CPU, memory, and I/O: use vmstat, iostat, and free. Pay special attention to the id column (CPU idle), free column (free memory in pages), and I/O metrics such as rMB/s, wMB/s, and %util to assess device load.

Compare these metrics against your baseline; large deviations require further investigation.

4. Verify Database Node Status

Check that all CN and DN nodes are in the open state; backup DN should be in mount state. Use the illustrated screenshots to confirm node health.

5. Tablespace Usage Check

First create a tablespace (if needed) using the SHARD keyword to distribute the command to CN and DN nodes: zsql omm/[email protected]:8000 -q Then create the tablespace:

CREATE TABLESPACE tbs_test1 DATAFILE 'tbs_test1' size 100m SHARD;

Verify that the data files exist on both CN and DN nodes. To query usage, connect to the primary DN: zsql / as sysdba -D /gaussdb/data/data_dn1 -q Run the following SQL to calculate total, free, and used space per tablespace:

set line 300
set pages 2000
set timing off
col tablespace_name for a25
col sum_GB for a15
col free_GB for a15
col use_precent for a15
select b.tablespace_name,
       round(sum(b.bytes)/1024/1024/1024,0) sum_GB,
       round(sum(nvl(a.bytes,0))/1024/1024/1024,0) free_GB,
       round((sum(b.bytes)-sum(nvl(a.bytes,0)))/sum(b.bytes,4)*100 use_precent,
       count(*)
from (select tablespace_name, file_id, sum(bytes) bytes
      from adm_free_space
group by tablespace_name, file_id) a,
     adm_data_files b
where a.file_id(+) = b.file_id
  and a.tablespace_name(+) = b.tablespace_name
group by b.tablespace_name
having round((sum(b.bytes)-sum(nvl(a.bytes,0)))/sum(b.bytes,4)*100) >= 0
order by 4 desc;

Run this query on all primary CN and DN nodes.

6. Abnormal Wait Event Inspection

Identify wait events with locks using:

col event format a38
select event, count(*)
from DV_SESSIONS
where LOCK_WAIT = 'Y'
group by event
order by 2 desc;

If TX waits appear, retrieve the blocking sessions:

select SID, SERIAL#, USERNAME, CURR_SCHEMA, CLIENT_IP, CLIENT_PORT, OSUSER, MACHINE, PROGRAM,
       STATUS, LOCK_WAIT, EVENT, MODULE, CURRENT_SQL
from dv_sessions
where sid in (select WAIT_SID from v$session where event like '%TX%');

For non‑active sessions originating from applications, coordinate with the application team or kill the session using ALTER SYSTEM KILL SESSION 'SID,SERIAL#';.

7. Log Inspection

GaussDB T generates several logs useful for troubleshooting:

Run log: $GSDB_DATA/log/run/zengine.rlog (or the path defined by log_home).

Slow query log: $GSDB_DATA/log/longsql/zengine.lsql (records SQL exceeding LONGSQL_TIMEOUT).

Alarm log: $GSDB_DATA/log/zenith_alarm.log.

Operation log: $GSDB_DATA/log/oper/zsql.olog.

TRACE log: $GSDB_DATA/trc/zengine_00003_xxxxxx.trc (captures deadlock information).

Review these logs on both CN and DN nodes to pinpoint issues.

8. Common Error Codes and Remedies

GS-00716 – Deadlock detected: Check trace or run logs, analyze the deadlock type and offending SQL, and adjust the application logic.

GS-00715 – Snapshot outdated: Re‑run the SQL or optimize long‑running high‑cost queries.

GS-00713 – No free undo page: Increase UNDO tablespace size or kill large transactions to free undo space.

GS-00305 – Network API timeout: Ensure network connectivity and stability.

GS-00774 – Failover in progress, cannot connect: Stop the primary host, wait for the standby to become primary, then demote the old primary.

GS-00839 – Flush redo file failed: Check the operating system and disk for failures.

Automating these daily checks with scripts or using Database Manager for alert analysis can greatly improve system stability and reduce manual errors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

sqlPerformance MonitoringError HandlingCluster ManagementDatabase MaintenancelogsGaussDB
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.