Databases 27 min read

Master OceanBase Slow Query Optimization: From Trace IDs to Manual Stats

This article walks through practical techniques for optimizing slow queries in OceanBase, covering essential data collection, distinguishing trace_id, sql_id, and plan_id, handling multiple sql_id scenarios, and manually gathering statistics across versions, with detailed SQL examples, tables, and visual aids.

Aikesheng Open Source Community

Sep 9, 2025

Master OceanBase Slow Query Optimization: From Trace IDs to Manual Stats

Author's note: Three years ago, when I first encountered OceanBase, I was a long‑time MySQL operations practitioner and felt lost with the new SQL optimization system. After accumulating operational experience, I organized my practical insights into a blog, which was later discovered by the OB official team and invited for a live sharing titled "OceanBase SQL Performance Optimization in Practice". This article compiles the live content and notes for community learning.

Outline

1. What information should be collected for OceanBase slow queries?

1.1 Basic execution information

1.2 View the actual execution plan

1.3 View the optimal execution plan

1.4 View execution profile

1.5 Obtain table structure and data volume

1.6 Three common questions

2. How to precisely distinguish trace_id, sql_id, and plan_id?

2.1 Background

2.2 Differences among the three IDs

2.3 How to obtain each ID

3. Why do multiple sql_id values appear for the same type of SQL?

3.1 Background

3.2 Quick parameterization

3.3 Summary

4. 3.X vs. 4.X: Manual statistics collection in OceanBase

4.1 Differences in manual statistics collection

4.2 DBMS_STATS method

4.3 ANALYZE method

4.4 Statistics verification

1. OceanBase Slow Query Information Collection

1.1 Basic Execution Information

-- Query under system tenant (root@sys)
SELECT
    usec_to_time(request_time) AS `Start Time`,
    elapsed_time AS `Total Elapsed Time`,
    execute_time AS `Plan Execution Time`,
    QUEUE_TIME AS `Queue Wait Time`,
    EVENT AS `Longest Wait Event`,
    TOTAL_WAIT_TIME_MICRO AS `Total Wait Time`,
    TOTAL_WAITS AS `Total Wait Count`,
    RPC_COUNT AS `RPC Count`,
    NET_TIME AS `Network Time`,
    NET_WAIT_TIME AS `Network Wait Time`,
    svr_ip,
    trace_id,
    plan_type,
    REQUEST_TYPE AS `Plan Type`,
    query_sql,
    sql_id,
    plan_id
FROM oceanbase.GV$SQL_AUDIT   -- Use GV$OB_SQL_AUDIT for 4.x
WHERE query_sql LIKE 'select%*%from%wms.sbtest1%'
  AND DB_NAME NOT IN ('oceanbase')
ORDER BY request_time DESC
LIMIT 3;

SQL tenant selection:

Execute the query in the business tenant (MySQL mode) if possible.

In Oracle mode, syntax incompatibility may arise.

It is recommended to run the query under the root@sys tenant to ensure smooth execution.

PLAN_TYPE: Execution Plan Type

Plan Type

Efficiency

Main Optimization Direction

Local Execution Plan

✅ High

Maintain locality, avoid degradation

Remote Execution Plan

⚠️ Medium

Improve routing, use ODP affinity

Distributed Execution Plan

❌ Low

Design table structure, table groups, parallel execution

Locating Performance Bottlenecks

Condition

Problem定位

ELAPSED_TIME far greater than EXECUTE_TIME; NET_WAIT_TIME/QUEUE_TIME high

Non‑database stage consumes most time, possibly at application/network layer

EXECUTE_TIME high proportion, CPU_TIME/IO_TIME high

Database side bottleneck, need to optimize plan or indexes

1.2 View the Actual Execution Plan

OceanBase caches execution plans in two views:

gv$plan_cache_plan_stat

gv$plan_cache_plan_explain

The latter shows the plan actually used at runtime and is more accurate.

View Comparison

Comparison Dimension

gv$plan_cache_plan_stat

gv$plan_cache_plan_explain

Function Description

Records detailed information of all plans in the cache and aggregate execution statistics for each plan.

Displays detailed physical execution plan information for a specific SQL in the cache.

Data Content

Includes statistics such as parameterization, execution count, memory usage, hit rate, etc.

Includes operator name, table name, estimated rows, cost, operator attributes, etc.

Query Conditions

Can filter directly by SQL ID or other fields.

Must specify IP, PORT, TENANT_ID, PLAN_ID.

1.3 View the Optimal Execution Plan

explain extended

1.4 View Execution Profile

The execution profile displays the detailed process of SQL execution, including operator order and data flow, helping to uncover potential performance issues.

SELECT
    op_id, op, output_rows, rescan, threads,
    close_time - open_time AS open_dt,
    last_row_eof_time - first_row_time AS row_dt,
    open_time, close_time, first_row_time, last_row_eof_time
FROM (
    SELECT plan_line_id AS op_id,
           concat(lpad(' ', max(plan_depth), ' '), plan_operation) AS op,
           sum(output_rows) AS output_rows,
           sum(STARTS) AS rescan,
           min(first_refresh_time) AS open_time,
           max(last_refresh_time) AS close_time,
           min(first_change_time) AS first_row_time,
           max(last_change_time) AS last_row_eof_time,
           sum(1) AS threads
    FROM gv$sql_plan_monitor
    WHERE trace_id = 'YB421E0001A5-000610C5B56FFFFE-0-0'
    GROUP BY plan_line_id, plan_operation, plan_depth
) a;

1.5 Obtain Table Structure and Data Volume

Join fields are not primary keys and lack indexes.

Implicit conversion of join fields.

Column cardinality.

Large table vs. small table.

1.6 Three Common Questions

Is a small table always suitable as a driver table?

Answer: Not necessarily.

No index or poor index.

Uneven data distribution.

Complex join conditions.

Is a large table always unsuitable as a driver table?

Answer: Not necessarily. Scenarios where a large table can be a driver:

Strong filter conditions exist.

Hash join with uniform data distribution.

Reasonable index design on the large table.

Cost‑based optimizer selection.

When to use hash join?

Large tables.

Effective filter conditions that reduce data size significantly.

For detailed case analysis, see the referenced article "SQL Optimization Comparison: Driver Table vs Hash Join".

2. Precisely Distinguishing trace_id, sql_id, plan_id

2.1 Background

During SQL optimization, the fields trace_id, sql_id, and plan_id are often confusing.

2.2 Differences Among the Three IDs

2.2.1 trace_id

trace_id

is an internal SQL‑level ID in OceanBase, defaulting to Y0-0000000000000000-0-0. It identifies the execution process of a SQL and is crucial for troubleshooting. When multiple statements are sent together with allowmultiqueries=true, they share the same trace_id.

A single trace_id may correspond to multiple SQL statements.

Binding execution plans can be done via trace_id (see OB operations guide).

2.2.2 sql_id

sql_id

uniquely identifies a specific SQL statement; it is the MD5 hash (128‑bit) of the normalized SQL text. The same SQL executed multiple times yields the same sql_id, while different statements have different values. It can be used for plan binding.

Binding execution plans can be achieved using sql_id when the SQL is parameterized.

2.2.3 plan_id

plan_id

uniquely identifies a plan in the plan_cache on a single OBServer. It is an incrementing value managed by the plan_cache module and resets after an OBServer restart. Together with tenant_id, it determines a specific execution plan.

2.2.4 Example Values

-- 4.x using GV$OB_SQL_AUDIT
SELECT trace_id, sql_id, plan_id FROM oceanbase.GV$SQL_AUDIT LIMIT 1;
+-----------------------------------+----------------------------------+------+
| trace_id                         | sql_id                           | plan_id |
+-----------------------------------+----------------------------------+------+
| YB420ABA381C-00063097C87624F3-0-0 | 2351C88EB78C36BCCD2F397CF7C3901C | 9409 |
+-----------------------------------+----------------------------------+------+

2.3 How to Obtain Each ID

2.3.1 Get trace_id

Method 1: Execute SELECT last_trace_id(); in the current session. SELECT last_trace_id(); Method 2: Query GV$SQL_AUDIT (or GV$OB_SQL_AUDIT for 4.x).

2.3.2 Get sql_id

Query gv$plan_cache_plan_stat.

Query gv$sql_audit.

Choose the appropriate view based on the scenario.

2.3.3 Get plan_id

Obtain from observer.log or via queries on v$sql_audit or gv$sql_audit:

SELECT plan_id, query_sql FROM gv$sql_audit WHERE query_sql LIKE '%total_waits%' LIMIT 1;

Alternatively, extract from OCP white‑screen logs.

3. Why Do Multiple sql_id Values Appear for the Same Type of SQL?

3.1 Background

When binding execution plans, sql_id is a key factor. Different sql_id values for similar SQL indicate that parameterization was not applied.

3.2 Quick Parameterization

3.2.1 Query sql_id

SELECT query_sql, sql_id FROM oceanbase.GV$SQL_AUDIT
WHERE query_sql LIKE 'select count(*) from sbtest1%'
  AND DB_NAME NOT IN ('oceanbase')
ORDER BY request_time DESC
LIMIT 3;

3.2.2 Applicable Scenarios

When filter predicates differ but the normalized SQL remains the same, sql_id stays consistent.

+--------------------------------------+----------------------------------+
| query_sql                           | sql_id                           |
+--------------------------------------+----------------------------------+
| select count(*) from sbtest1 where k like '7%' | 690923EE7BBA536BA7392977418F6058 |
| select count(*) from sbtest1 where k like '6%' | 690923EE7BBA536BA7392977418F6058 |
| select count(*) from sbtest1               | CCBA8AE1C7D1C165CD098E3490DE5A22 |
+--------------------------------------+----------------------------------+

When the predicate values differ but the structure is identical, sql_id remains the same.

3.2.3 Non‑Parameterizable Scenarios

Cases that generate different sql_id values:

Variable number of elements in an IN clause.

Differences in case or whitespace.

Different constants after ORDER BY.

Different constants after GROUP BY.

Different string format functions.

Different function arguments leading to different results.

Functions with hidden information (e.g., timestamps).

-- IN clause example
SELECT query_sql, sql_id FROM ... WHERE id IN (1,3);
SELECT query_sql, sql_id FROM ... WHERE id IN (4,5,6);
-- Case/space example
SELECT * FROM sbtest1 WHERE id = 111;
select * from sbtest1 where id = 111;
-- ORDER BY constant example
SELECT * FROM sbtest1 WHERE id = 1 ORDER BY 1 LIMIT 10;
SELECT * FROM sbtest1 WHERE id = 1 ORDER BY 2 LIMIT 10;
-- GROUP BY constant example
SELECT * FROM sbtest1 WHERE id = 1 GROUP BY 2 LIMIT 10;
SELECT * FROM sbtest1 WHERE id = 1 GROUP BY 1 LIMIT 10;
-- String format example
SELECT DATE_FORMAT('2010-06-00','%m');
SELECT DATE_FORMAT('2010-06-00','%y');
-- Function argument example
SELECT SUBSTR('abcdefg',3,4);
SELECT SUBSTR('abcdefg',1,2);
-- Hidden info example
SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19.015');
SELECT UNIX_TIMESTAMP('2015-11-13 10:20:19.014');

3.3 Summary

The following situations generate multiple distinct sql_id values and should be considered when binding plans:

Variable number of elements in an IN clause.

Case differences or extra whitespace.

Different constants after ORDER BY.

Different constants after GROUP BY.

Different string format specifications.

Different function arguments with different results.

Functions containing hidden information that affect execution.

4. 3.X vs. 4.X: Manual Statistics Collection in OceanBase

4.1 Differences in Manual Statistics Collection

4.1.1 Command Support Differences

Version

MySQL Mode

Oracle Mode

3.X

Only ANALYZE command supported

Both DBMS_STATS package and ANALYZE command supported

4.X

Both DBMS_STATS package and ANALYZE command supported

4.1.2 Daily Merge Differences

Improvement

V3.x and Earlier

V4.0 and Later

Statistics collection timing

During daily merge period

Decoupled, independent collection

Accuracy

Incremental merge may be inaccurate

Full scan, more accurate

Histogram support

Not supported

Supported

Plan stability

Easily affected by merge

More stable

Management flexibility

Depends on merge

Supports manual + automatic + window configuration

4.2 DBMS_STATS Method

Note: For 3.X MySQL tenant mode, DBMS_STATS is not recommended.

4.2.1 Table‑Level Statistics

-- Small table (less than 10 million rows)
CALL dbms_stats.gather_table_stats('test','t1');

-- Large table (more than 10 million rows)
CALL dbms_stats.gather_table_stats('test','t1', degree=>8);

4.2.2 Partitioned Table Statistics

CALL dbms_stats.gather_table_stats('test','t_part', degree=>2, granularity=>'APPROX_GLOBAL AND PARTITION');

4.2.3 Schema‑Level Statistics

-- Collect statistics for all objects in TEST schema
CALL dbms_stats.gather_schema_stats('TEST');

-- With parallelism
CALL dbms_stats.gather_schema_stats('TEST', degree=>'16');

-- Collect statistics for a specific large table with column granularity size 1
CALL dbms_stats.gather_table_stats('test','big_table', degree=>128, method_opt=>'for all columns size 1');

-- Lock statistics
CALL dbms_stats.lock_table_stats('test','big_table');

-- Unlock statistics
CALL dbms_stats.unlock_table_stats('test','big_table');

4.3 ANALYZE Method

Note: For 3.X MySQL tenant mode, ANALYZE is the recommended way.

4.3.1 ANALYZE Syntax in MySQL Mode

-- Update histogram for specific columns
ANALYZE TABLE sbtest1 UPDATE HISTOGRAM ON id,k,c,pad WITH 30 BUCKETS;

To collect statistics for all columns, enable Oracle‑mode extensions first:

ALTER SYSTEM SET enable_sql_extension = TRUE;
ANALYZE TABLE products COMPUTE STATISTICS FOR ALL COLUMNS SIZE 30;

4.4 Statistics Verification

Applicable to OceanBase 4.2 and later.

MySQL Mode

SELECT DISTINCT DATABASE_NAME, TABLE_NAME
FROM oceanbase.DBA_OB_TABLE_STAT_STALE_INFO
WHERE DATABASE_NAME NOT IN ('oceanbase','mysql','__recyclebin')
  AND (IS_STALE = 'YES' OR LAST_ANALYZED_TIME IS NULL);

Oracle Mode

SELECT DISTINCT OWNER, TABLE_NAME
FROM sys.DBA_OB_TABLE_STAT_STALE_INFO
WHERE OWNER NOT IN ('oceanbase','__recyclebin')
  AND (IS_STALE = 'YES' OR LAST_ANALYZED_TIME IS NULL);

References

[1] SQL Optimization Comparison: Driver Table vs Hash Join – https://cloud.tencent.com/developer/article/2537372

[2] OB Operations – How to Effectively Use Outline – https://open.oceanbase.com/blog/16392302106

[3] Quick Parameterization – https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000002013760

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

SQL optimization OceanBase Trace ID SQL ID Plan ID Statistics Collection

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.