Databases 12 min read

Troubleshooting SQL Exception Retries in OceanBase Using Logs and gv$sql_audit

This article demonstrates how to diagnose and resolve repeated UPDATE failures in OceanBase by extracting information from Top SQL, log files, and the gv$sql_audit view, identifying error codes such as 6005, 6003, and 6212, and applying corrective actions like adjusting timeout settings, killing lock‑holding sessions, and checking lock statistics.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Troubleshooting SQL Exception Retries in OceanBase Using Logs and gv$sql_audit

1 Introduction

We observed abnormal SQL statements on the OCP cloud platform Top SQL interface without detailed error messages. This article uses a SQL exception retry scenario and demonstrates how to locate the exact cause using log files and the gv$sql_audit view.

2 Background

OceanBase 3.X Enterprise Edition in MySQL mode.

A customer reported a slow UPDATE on a table during performance testing, which never succeeded.

The gv$sql_audit information for this SQL has been cleared, and Top SQL does not provide a specific error reason, so investigation must rely on log files.

Initial status on the OCP cloud platform shows:

UPDATE statement retried 440 times

Average response time 5491.15 ms

The second half of the article extends this scenario.

3 Investigation Process

1. Export Top SQL

Select the required columns (e.g., SQL ID, retry count) in the Top SQL UI.

2. Copy SQL Text

Copy the full SQL statement for further analysis.

3. Locate the UPDATE Statement in Logs

On the relevant server, run grep to count how many times the SQL was printed:

# grep -i "UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1" observer.log.2024040916* | wc -l
505

The log shows 505 executions within one hour, indicating a possible anomaly. The SQL is usually printed in observer.log when an error occurs.

4. Search for Error Positions

Search the logs for the SQL text and the ret= pattern to find related error codes such as 4012 and 6003.

# grep -i "UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1" observer.log.2024040916* | grep "ret="

5. Retrieve Error Information

Search for the trace_id to view the main error messages, which include write‑conflict errors (on_wlock_retry, lock_for_write conflict), error code 6005, and a failure to update the row.

6. Write‑Conflict Log Sample

# grep -i "YB420ABA40A1-000615A29EDEEA36-0-0" observer.log.2024040916* | grep "lock_for_write conflict"

7. Confirm Retry Behavior

Verify that the trace_id is associated with a retrying SQL:

# grep -i "YB420ABA40A1-000615A29EDEEA36-0-0" observer.log.2024040916* | grep -i "will retry"

8. Error Code Meaning

6005 : Update operation lock failure.

6003 : Wait lock timeout.

6212 : Statement timeout.

For statement timeout, check the tenant variable ob_query_timeout and compare the cur_query_start_time with the timeout timestamp to confirm the timeout matches the configured value (10 s in the example).

4 Problem Summary

The UPDATE statement UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1 failed to acquire a lock, and the tenant's ob_query_timeout (10 s) triggered error code 6212, causing the statement to roll back.

Possible causes:

A long‑running transaction holds a lock, blocking other sessions.

Concurrent updates on the same row with flawed concurrency logic.

5 Optimization Measures

Set appropriate timeout values.

Implement proper concurrency control in application code.

Monitor long‑transaction alerts.

6 Extended Scenario

If the SQL is continuously retrying and the gv$sql_audit view is still available, follow these steps:

1. Copy SQL ID from OCP Cloud Platform

2. Query Main Error Codes Using SQL ID

select /*+ PARALLEL(8) */
  trace_id,
  sid,
  tenant_name,
  svr_ip,
  svr_port,
  retry_cnt,
  ret_code,
  query_sql,
  usec_to_time(request_time) as start_time
from gv$sql_audit
where sql_id = 'D884EA797E73F466819BAE2AE4AC1FE1'
  and retry_cnt > 1
group by ret_code
order by retry_cnt desc;

3. Retrieve Session ID

select /*+ PARALLEL(8) */
  trace_id,
  sid,
  tenant_name,
  svr_ip,
  svr_port,
  retry_cnt,
  ret_code,
  query_sql,
  usec_to_time(request_time) as start_time
from gv$sql_audit
where sql_id = 'D884EA797E73F466819BAE2AE4AC1FE1'
group by sid
order by request_time desc;

4. Query Table ID

select database_name,
       table_id,
       table_name,
       tenant_id,
       tenant_name
from oceanbase.gv$table
where tenant_id = 1001
  and database_name = 'evan'
  and table_name = 'evan_zheng';

5. Query Lock‑Holder Transaction Info

select * from __all_virtual_trans_lock_stat where table_id = '1100611139453778'\G

6. Query Lock‑Waiter Transaction Info

select * from __all_virtual_lock_wait_stat where table_id = '1100611139453778'\G

7. Query SQL of Lock‑Holder Session

select trace_id,
       usec_to_time(request_time),
       query_sql
from gv$sql_audit
where TENANT_ID = 1001
  and USER_NAME = 'root'
  and SID = '3221616444'
order by request_time desc;

8. Query SQL of Lock‑Waiter Session

select trace_id,
       usec_to_time(request_time),
       query_sql
from gv$sql_audit
where TENANT_ID = 1001
  and USER_NAME = 'root'
  and SID = '3221618060'
order by request_time desc;

The lock‑holder and lock‑waiter both target evan_zheng.id=1. The lock‑holder opened a manual transaction and did not commit, causing the waiter to keep retrying the UPDATE.

9. Kill Lock‑Holder Session

After confirming the risk, kill the lock‑holder session to release the lock.

7 Other Error Codes

ob_query_timeout : Controls statement execution time limit; exceeding it returns error 6212 (default 10 s).

ob_trx_timeout : Controls transaction timeout; exceeding it returns error 6210, requiring a ROLLBACK.

ob_trx_idle_timeout : Controls maximum idle time for a transaction; exceeding it returns error 6224, also requiring a ROLLBACK.

8 References

OceanBase Database Log Interpretation Example – https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000207691

OceanBase Database Transaction Troubleshooting Guide – https://www.oceanbase.com/knowledge-base/oceanbase-database-20000000026

OceanBase Row Lock Issue Troubleshooting Guide – https://www.oceanbase.com/knowledge-base/oceanbase-database-20000000016

Transaction Control Overview – https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000642694

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLdatabasetroubleshootingOceanBaseErrorCodes
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.