Troubleshooting SQL Exception Retries in OceanBase Using Logs and gv$sql_audit
This article demonstrates how to diagnose and resolve repeated UPDATE failures in OceanBase by extracting information from Top SQL, log files, and the gv$sql_audit view, identifying error codes such as 6005, 6003, and 6212, and applying corrective actions like adjusting timeout settings, killing lock‑holding sessions, and checking lock statistics.
1 Introduction
We observed abnormal SQL statements on the OCP cloud platform Top SQL interface without detailed error messages. This article uses a SQL exception retry scenario and demonstrates how to locate the exact cause using log files and the gv$sql_audit view.
2 Background
OceanBase 3.X Enterprise Edition in MySQL mode.
A customer reported a slow UPDATE on a table during performance testing, which never succeeded.
The gv$sql_audit information for this SQL has been cleared, and Top SQL does not provide a specific error reason, so investigation must rely on log files.
Initial status on the OCP cloud platform shows:
UPDATE statement retried 440 times
Average response time 5491.15 ms
The second half of the article extends this scenario.
3 Investigation Process
1. Export Top SQL
Select the required columns (e.g., SQL ID, retry count) in the Top SQL UI.
2. Copy SQL Text
Copy the full SQL statement for further analysis.
3. Locate the UPDATE Statement in Logs
On the relevant server, run grep to count how many times the SQL was printed:
# grep -i "UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1" observer.log.2024040916* | wc -l
505The log shows 505 executions within one hour, indicating a possible anomaly. The SQL is usually printed in observer.log when an error occurs.
4. Search for Error Positions
Search the logs for the SQL text and the ret= pattern to find related error codes such as 4012 and 6003.
# grep -i "UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1" observer.log.2024040916* | grep "ret="5. Retrieve Error Information
Search for the trace_id to view the main error messages, which include write‑conflict errors (on_wlock_retry, lock_for_write conflict), error code 6005, and a failure to update the row.
6. Write‑Conflict Log Sample
# grep -i "YB420ABA40A1-000615A29EDEEA36-0-0" observer.log.2024040916* | grep "lock_for_write conflict"7. Confirm Retry Behavior
Verify that the trace_id is associated with a retrying SQL:
# grep -i "YB420ABA40A1-000615A29EDEEA36-0-0" observer.log.2024040916* | grep -i "will retry"8. Error Code Meaning
6005 : Update operation lock failure.
6003 : Wait lock timeout.
6212 : Statement timeout.
For statement timeout, check the tenant variable ob_query_timeout and compare the cur_query_start_time with the timeout timestamp to confirm the timeout matches the configured value (10 s in the example).
4 Problem Summary
The UPDATE statement UPDATE evan.evan_zheng SET name = 'test0409' WHERE id = 1 failed to acquire a lock, and the tenant's ob_query_timeout (10 s) triggered error code 6212, causing the statement to roll back.
Possible causes:
A long‑running transaction holds a lock, blocking other sessions.
Concurrent updates on the same row with flawed concurrency logic.
5 Optimization Measures
Set appropriate timeout values.
Implement proper concurrency control in application code.
Monitor long‑transaction alerts.
6 Extended Scenario
If the SQL is continuously retrying and the gv$sql_audit view is still available, follow these steps:
1. Copy SQL ID from OCP Cloud Platform
2. Query Main Error Codes Using SQL ID
select /*+ PARALLEL(8) */
trace_id,
sid,
tenant_name,
svr_ip,
svr_port,
retry_cnt,
ret_code,
query_sql,
usec_to_time(request_time) as start_time
from gv$sql_audit
where sql_id = 'D884EA797E73F466819BAE2AE4AC1FE1'
and retry_cnt > 1
group by ret_code
order by retry_cnt desc;3. Retrieve Session ID
select /*+ PARALLEL(8) */
trace_id,
sid,
tenant_name,
svr_ip,
svr_port,
retry_cnt,
ret_code,
query_sql,
usec_to_time(request_time) as start_time
from gv$sql_audit
where sql_id = 'D884EA797E73F466819BAE2AE4AC1FE1'
group by sid
order by request_time desc;4. Query Table ID
select database_name,
table_id,
table_name,
tenant_id,
tenant_name
from oceanbase.gv$table
where tenant_id = 1001
and database_name = 'evan'
and table_name = 'evan_zheng';5. Query Lock‑Holder Transaction Info
select * from __all_virtual_trans_lock_stat where table_id = '1100611139453778'\G6. Query Lock‑Waiter Transaction Info
select * from __all_virtual_lock_wait_stat where table_id = '1100611139453778'\G7. Query SQL of Lock‑Holder Session
select trace_id,
usec_to_time(request_time),
query_sql
from gv$sql_audit
where TENANT_ID = 1001
and USER_NAME = 'root'
and SID = '3221616444'
order by request_time desc;8. Query SQL of Lock‑Waiter Session
select trace_id,
usec_to_time(request_time),
query_sql
from gv$sql_audit
where TENANT_ID = 1001
and USER_NAME = 'root'
and SID = '3221618060'
order by request_time desc;The lock‑holder and lock‑waiter both target evan_zheng.id=1. The lock‑holder opened a manual transaction and did not commit, causing the waiter to keep retrying the UPDATE.
9. Kill Lock‑Holder Session
After confirming the risk, kill the lock‑holder session to release the lock.
7 Other Error Codes
ob_query_timeout : Controls statement execution time limit; exceeding it returns error 6212 (default 10 s).
ob_trx_timeout : Controls transaction timeout; exceeding it returns error 6210, requiring a ROLLBACK.
ob_trx_idle_timeout : Controls maximum idle time for a transaction; exceeding it returns error 6224, also requiring a ROLLBACK.
8 References
OceanBase Database Log Interpretation Example – https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000207691
OceanBase Database Transaction Troubleshooting Guide – https://www.oceanbase.com/knowledge-base/oceanbase-database-20000000026
OceanBase Row Lock Issue Troubleshooting Guide – https://www.oceanbase.com/knowledge-base/oceanbase-database-20000000016
Transaction Control Overview – https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000642694
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
