Databases 15 min read

Analyzing Slow Queries in OceanBase: Log‑Based Diagnosis and Performance Optimization

This article consolidates practical experience and author insights to guide OceanBase engineers in quickly diagnosing and resolving slow‑SQL performance issues by leveraging ODP and OBServer logs, key metrics, and systematic analysis steps.

Aikesheng Open Source Community

May 25, 2023

Analyzing Slow Queries in OceanBase: Log‑Based Diagnosis and Performance Optimization

This article aggregates senior engineers' experience and the author's understanding to help OceanBase (OB) engineers quickly troubleshoot slow SQL execution problems using key logs and systematic analysis.

Background

Production architectures that connect applications to OB typically follow two patterns:

Application → OBProxy → OBServer

Application → OBProxy‑Sharding → OBServer

The first pattern covers most customers, while the second is a sharding scenario; both are collectively referred to as ODP (OceanBase Database Proxy).

When a statement takes a long time, the investigation points include network time from the application to ODP, ODP execution time, network time from ODP to OBServer, and OBServer execution time.

From Which Information Should You Start?

Most of the time, you obtain timing details from the following ODP components:

obproxy_digest.log : audit log that records failed SQL statements and those whose execution time exceeds query_digest_time_threshold (default 2 ms).

obproxy_slow.log : slow‑SQL request log that records statements whose execution time exceeds slow_query_time_threshold (default 500 ms).

obproxy.log : the general ODP log.

In obproxy_digest.log and obproxy_slow.log, columns 15‑18 (e.g., 8353us,179us,0us,5785us) represent ODP total processing time, ODP preprocessing time, ODP connection acquisition time, and OBServer execution time respectively. Example:

2023-05-04<br/>16:46:03.513268,test_obproxy,,,,test:ob_mysql:sbtest,OB_MYSQL,sbtest1,sbtest1,COM_QUERY<br/>,SELECT,failed,1064,select t1.*%2Ct2.* from sbtest1 t1%2Csbtest2 t2 where t1.id = t2.id where id <10000,8353us,179us,0us,5785us,Y0-7FA25BB4A2E0,YB420ABA3FAC-0005FA2415BE0F81-0-0,,,0,10.186.63.172:2881

The start of ODP total time is when ODP receives the client request; the end is when ODP writes all data back to the client. Preprocessing time includes the lookup of oceanbase.__all_virtual_proxy_schema on the leader. Connection acquisition time is currently always 0. OBServer execution time spans from ODP sending the request to OBServer until the first record is returned.

Because the sum of the last three items does not always equal the total, a long total time with short preprocessing and OBServer times often indicates that the bottleneck lies between OBServer returning the first row and ODP sending the full result set to the client, which is common for large result sets.

OBServer Components

gv$audit_sql : view that shows source, execution status, and other statistics for every SQL request on an OBServer. Important fields include request_time, sql_id, plan_id, plan_type, trace_id, svr_ip, client_ip, user_name, db_name, elapsed_time, queue_time, execute_time, retry_cnt, table_scan, ret_code, and query_sql.

observer.log : comprehensive OBServer log; useful entries are often located via trace_id (or TraceID) from OCP's SQL diagnostic feature.

Common OceanBase Slow‑Query Analysis Paths

1. ODP Takes Long Time to Write Data Back to the Application

When the result set is large, ODP may spend considerable time returning data. Example from obproxy.log shows client_response_bytes:185002181 and client_response_write_time_us=1130104, indicating ~185 MB of data took about 1.1 s to send.

2. Slow "Get Location Cache" in ODP

ODP must locate the leader partition for each table; this is usually fast and cached, but occasional delays (e.g., pl_lookup_time_us=4998993, ~5 s) signal a problem that should be escalated to OB developers.

3. Table Routing Choices

OceanBase supports Local, Remote, and Distributed plans. Remote plans indicate sub‑optimal routing, often caused by ODP routing issues, inability to compute partition IDs, global indexes, or missing secondary routing.

4. OBServer Write Throttling

When memstore usage reaches writing_throttling_trigger_percentage (default 100), write throttling is triggered. The throttling duration is limited by writing_throttling_maximum_duration (default 1 h). Throttling records appear in observer.log with keywords like "report write throttle".

5. Access Plan Issues

If a query does not hit the plan cache ( IS_HIT_PLAN=0) or the cached plan changes unexpectedly, execution time can increase. Use explain extended <query_sql> for the current plan, but to see the plan used by a slow query, retrieve SVR_IP, SVR_PORT, TENANT_ID, and PLAN_ID from gv$sql_audit and query gv$plan_cache_plan_explain.

6. OBServer Lock Waits

OceanBase uses MVCC; lock waits usually occur on write requests (lock_for_write). Indicators include large elapsed_time, small execute_time, and high retry_cnt in gv$sql_audit, plus corresponding entries in observer.log.

7. Problematic SQL Statements

After ruling out the above, most slow queries stem from the SQL itself—missing indexes, inefficient predicates, etc. Retrieve the full SQL text from gv$sql_audit or ODP logs, then run explain to examine the execution plan and identify issues such as full‑table scans.

These are the typical causes of slow queries in OceanBase and the corresponding analysis approaches to help readers diagnose and resolve performance problems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance-analysis OceanBase OBproxy Database Logs

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.