Analyzing Slow Queries in OceanBase: Log‑Based Diagnosis and Performance Optimization
This article consolidates practical experience and author insights to guide OceanBase engineers in quickly diagnosing and resolving slow‑SQL performance issues by leveraging ODP and OBServer logs, key metrics, and systematic analysis steps.
This article aggregates senior engineers' experience and the author's understanding to help OceanBase (OB) engineers quickly troubleshoot slow SQL execution problems using key logs and systematic analysis.
Background
Production architectures that connect applications to OB typically follow two patterns:
Application → OBProxy → OBServer
Application → OBProxy‑Sharding → OBServer
The first pattern covers most customers, while the second is a sharding scenario; both are collectively referred to as ODP (OceanBase Database Proxy).
When a statement takes a long time, the investigation points include network time from the application to ODP, ODP execution time, network time from ODP to OBServer, and OBServer execution time.
From Which Information Should You Start?
Most of the time, you obtain timing details from the following ODP components:
obproxy_digest.log : audit log that records failed SQL statements and those whose execution time exceeds query_digest_time_threshold (default 2 ms).
obproxy_slow.log : slow‑SQL request log that records statements whose execution time exceeds slow_query_time_threshold (default 500 ms).
obproxy.log : the general ODP log.
In obproxy_digest.log and obproxy_slow.log , columns 15‑18 (e.g., 8353us,179us,0us,5785us ) represent ODP total processing time, ODP preprocessing time, ODP connection acquisition time, and OBServer execution time respectively. Example:
2023-05-04
16:46:03.513268,test_obproxy,,,,test:ob_mysql:sbtest,OB_MYSQL,sbtest1,sbtest1,COM_QUERY
,SELECT,failed,1064,select t1.*%2Ct2.* from sbtest1 t1%2Csbtest2 t2 where t1.id = t2.id where id <10000,8353us,179us,0us,5785us,Y0-7FA25BB4A2E0,YB420ABA3FAC-0005FA2415BE0F81-0-0,,,0,10.186.63.172:2881The start of ODP total time is when ODP receives the client request; the end is when ODP writes all data back to the client. Preprocessing time includes the lookup of oceanbase.__all_virtual_proxy_schema on the leader. Connection acquisition time is currently always 0. OBServer execution time spans from ODP sending the request to OBServer until the first record is returned.
Because the sum of the last three items does not always equal the total, a long total time with short preprocessing and OBServer times often indicates that the bottleneck lies between OBServer returning the first row and ODP sending the full result set to the client, which is common for large result sets.
OBServer Components
gv$audit_sql : view that shows source, execution status, and other statistics for every SQL request on an OBServer. Important fields include request_time , sql_id , plan_id , plan_type , trace_id , svr_ip , client_ip , user_name , db_name , elapsed_time , queue_time , execute_time , retry_cnt , table_scan , ret_code , and query_sql .
observer.log : comprehensive OBServer log; useful entries are often located via trace_id (or TraceID) from OCP's SQL diagnostic feature.
Common OceanBase Slow‑Query Analysis Paths
1. ODP Takes Long Time to Write Data Back to the Application
When the result set is large, ODP may spend considerable time returning data. Example from obproxy.log shows client_response_bytes:185002181 and client_response_write_time_us=1130104 , indicating ~185 MB of data took about 1.1 s to send.
2. Slow "Get Location Cache" in ODP
ODP must locate the leader partition for each table; this is usually fast and cached, but occasional delays (e.g., pl_lookup_time_us=4998993 , ~5 s) signal a problem that should be escalated to OB developers.
3. Table Routing Choices
OceanBase supports Local, Remote, and Distributed plans. Remote plans indicate sub‑optimal routing, often caused by ODP routing issues, inability to compute partition IDs, global indexes, or missing secondary routing.
4. OBServer Write Throttling
When memstore usage reaches writing_throttling_trigger_percentage (default 100), write throttling is triggered. The throttling duration is limited by writing_throttling_maximum_duration (default 1 h). Throttling records appear in observer.log with keywords like "report write throttle".
5. Access Plan Issues
If a query does not hit the plan cache ( IS_HIT_PLAN=0 ) or the cached plan changes unexpectedly, execution time can increase. Use explain extended <query_sql> for the current plan, but to see the plan used by a slow query, retrieve SVR_IP , SVR_PORT , TENANT_ID , and PLAN_ID from gv$sql_audit and query gv$plan_cache_plan_explain .
6. OBServer Lock Waits
OceanBase uses MVCC; lock waits usually occur on write requests (lock_for_write). Indicators include large elapsed_time , small execute_time , and high retry_cnt in gv$sql_audit , plus corresponding entries in observer.log .
7. Problematic SQL Statements
After ruling out the above, most slow queries stem from the SQL itself—missing indexes, inefficient predicates, etc. Retrieve the full SQL text from gv$sql_audit or ODP logs, then run explain to examine the execution plan and identify issues such as full‑table scans.
These are the typical causes of slow queries in OceanBase and the corresponding analysis approaches to help readers diagnose and resolve performance problems.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.