Analyzing and Optimizing Slow OceanBase SQL Queries Using TopSQL and Execution Plan Rewrites
This article investigates a slow OceanBase SQL query by examining CPU and TopSQL metrics, dissecting the execution plan, identifying batch join inefficiencies, rewriting the SQL for better performance, and demonstrating a reduction in execution time from over 2600 seconds to under one second.
1. Background
1.1 Problem Description
A business system reports extremely slow responses from the OceanBase database, causing service degradation. Monitoring shows CPU usage spiking to 100% during the incident.
TopSQL captured a single slow SQL with an average execution time exceeding 2800 seconds.
1.2 Environment Information
Database: OceanBase Enterprise Edition 4.2.1.9
Architecture: 1-1-1, PrimaryZone Random
Tenant Specification: 30c300g
1.3 The Slowest TopSQL
The SQL text and its SQL‑ID were retrieved.
select *
from (
select A.*, RowNum as Fast_RowNum
from (
SELECT *
FROM (
SELECT TABLE1.C1, TABLE1.C2, xxx, TABLE1.P1, TABLE1.P2
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE2.PID = TABLE1.ID AND TABLE2.ISMAIN = '1'
LEFT JOIN TABLE6 ON TABLE6.ID = TABLE2.INBANK
LEFT JOIN TABLE4 ON TABLE4.ID = TABLE2.COUNTRY
WHERE TABLE1.STATE_ISENABLED = '1'
) T
where 1 = 1
and ID in (
SELECT ID FROM TABLE1 WHERE domaintype = '2' AND EXISTS (
SELECT PID FROM TABLE3 WHERE DOMAINID IN (
SELECT DOMAINID FROM TABLE5 WHERE ORGID = '0000000000000000001' AND ORGISENABLED = '1'
) AND DOMAINISENABLED = '1' AND TABLE1.ID = PID
)
)
) A
where RowNum < 51
)
where Fast_RowNum > 0;2. Analysis Process
2.1 SQL_AUDIT Analysis
The execution time distribution for the SQL_ID shows a baseline execution time above 2600 seconds.
2.2 Execution Plan Analysis
The original execution plan is displayed below:
===========================================================================================================================
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
---------------------------------------------------------------------------------------------------------------------------
|0 |COUNT | |50 |396423 |
|1 |└─SUBPLAN SCAN |VIEW1 |50 |396423 |
|2 | └─LIMIT | |50 |396423 |
|3 | └─NESTED-LOOP OUTER JOIN | |50 |396423 |
|4 | ├─NESTED-LOOP OUTER JOIN| |50 |395510 |
|5 | │ ├─SUBPLAN SCAN |VIEW5 |38 |394193 |
|6 | │ │ └─LIMIT | |38 |394193 |
|7 | │ │ └─NESTED-LOOP SEMI JOIN| |38 |394193 |
|8 | │ │ ├─TABLE FULL SCAN|TABLE1 |6463 |2028 |
|9 | │ │ └─SUBPLAN SCAN |VIEW3 |1 |60 |
|... (truncated for brevity) ...
===========================================================================================================================Execution Plan Interpretation
The optimizer first performs a full table scan on TABLE1 , using it as the driver table. For each driver row, it repeatedly scans the right‑hand sub‑operators, leading to massive row‑wise processing.
The sub‑plan scan of VIEW3 is executed 287 0000 times.
The join does not benefit from batch processing ( use_batch=false ).
2.3 SQL_Plan_Monitor Analysis
Dynamic monitoring confirms that operators 7‑13 repeatedly rescan the inner tables, causing the long execution time.
+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+
| OP_ID | OP | OUTPUT_ROWS | RESCAN | THREADS | OPEN_DT | ROW_DT | OPEN_TIME | CLOSE_TIME | FIRST_ROW_TIME | LAST_ROW_EOF_TIME |
+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+
| 7 | PHY_NESTED_LOOP_JOIN | 14 | 856237 | 1 | NULL | +000000000 00:02:19.682129 | 2025-03-10 17:02:17.955383 | NULL | 2025-03-10 17:02:20.792033 | 2025-03-10 17:05:24.567077 |
... (truncated) ...
+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+2.4 Why Batch Join Does Not Take Effect
OceanBase supports Batch Rescan for regular NLJ, but the current semi‑join implementation cannot use this optimization, so use_batch=false .
2.5 SQL Rewrite
The original SQL is rewritten to keep the left joins unchanged, extract complex filters into a separate view, replace the EXISTS with an inner join, and add DISTINCT to avoid duplicates.
SELECT *
FROM (
SELECT A.*, ROWNUM AS Fast_RowNum
FROM (
SELECT T.*
FROM (
SELECT TABLE1.C1, TABLE1.C2, xxx, TABLE1.P1, TABLE1.P2
FROM TABLE1
LEFT JOIN TABLE2 ON TABLE2.PID = TABLE1.ID AND TABLE2.ISMAIN = '1'
LEFT JOIN TABLE6 ON TABLE6.ID = TABLE2.INBANK
LEFT JOIN TABLE4 ON TABLE4.ID = TABLE2.COUNTRY
WHERE TABLE1.STATE_ISENABLED = '1'
) T
JOIN (
SELECT DISTINCT P.ID
FROM TABLE1 P
JOIN TABLE3 D ON P.ID = D.PID AND D.DOMAINISENABLED = '1'
WHERE D.DOMAINID IN (
SELECT DOMAINID FROM TABLE5 WHERE ORGID = '0000000000000000001' AND ORGISENABLED = '1'
) AND P.DOMAINTYPE = '2'
) NEW ON T.ID = NEW.ID
) A
WHERE ROWNUM < 51
) WHERE Fast_RowNum > 0;Rewrite Principles
Preserve left joins in the main query.
Move complex filters to an independent view.
Replace EXISTS with an inner join.
Add DISTINCT to guarantee unique IDs.
Rewritten Execution Plan (excerpt)
=========================================================================================================================
|ID|OPERATOR |NAME |EST.ROWS|EST.TIME(us)|
-------------------------------------------------------------------------------------------------------------------------
|0 |COUNT | |50 |5271 |
|1 |└─SUBPLAN SCAN |VIEW1 |50 |5271 |
|2 | └─LIMIT | |50 |5271 |
|3 | └─NESTED-LOOP OUTER JOIN | |50 |5271 |
|4 | ├─NESTED-LOOP OUTER JOIN| |50 |4358 |
|5 | │ ├─SUBPLAN SCAN |VIEW2 |38 |3041 |
|... (truncated) ...
=========================================================================================================================The rewritten plan shows that the driver table now contains only 42 rows, allowing the optimizer to use batch join ( use_batch=true ) and dramatically reduce the number of full‑table scans.
Conclusion
Before optimization, the original SQL used TABLE1 as the driver in a nested‑loop semi‑join, resulting in over 2 600 seconds of execution time. After rewriting, the driver set shrank to 42 rows, batch join was enabled, and execution time dropped to 0.6 seconds.
Performance Comparison
Oracle: 1.79 s
OceanBase (original): >2600 s
OceanBase (optimized): 0.6 s
Supplementary Knowledge
1. Why does rewriting EXISTS to a semi‑join sometimes slow down the query? See reference [2].
2. How to verify whether the batch parameter is effective?
select * from CDB_OB_SYS_VARIABLES where name like '%batch%' and tenant_id=1054;Result shows _nlj_batching_enabled = 1 , indicating batch processing is enabled.
References
[1] Batch Rescan: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000221518#1-title-DAS%20Group%20Rescan
[2] Why does rewriting EXISTS to a semi‑join cause slowdown?: https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000477585?back=kb
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.