Databases 12 min read

Analyzing and Optimizing Slow Scalar Subqueries in OceanBase 3.2.3.3

This article examines why a scalar subquery in an OceanBase 3.2.3.3 SQL statement takes over 1000 seconds, breaks down the execution plan, identifies the costly nested-loop behavior, and presents a rewrite using WITH and LEFT JOIN that reduces the cost from 788 million to 3.6 million and cuts runtime to about 10 seconds.

Aikesheng Open Source Community

Jul 25, 2024

Analyzing and Optimizing Slow Scalar Subqueries in OceanBase 3.2.3.3

1 Problem Description

Database version: OceanBase 3.2.3.3. The following SQL statement runs for more than 1000 seconds. This article uses this example to discuss the reasons behind the slowness of scalar subqueries and the corresponding optimization methods.

select
  rq.processinstid processinstid,
  rq.question_id questionId,
  rq.question_no questionNo,
  to_char(rq.rev_start_date, 'yyyy-MM-dd') revStartDate,
  (
    select e.name
    from e
    where e.category_code = 'REV_SOURCE'
      and e.code = rq.rev_source
  ) revSource,
  (
    select e.name
    from e
    where e.category_code = 'QUESTION_TYPE'
      and e.code = rq.question_type
  ) questionType,
  rq.question_summary questionSummary,
  rq.question_desc questionDesc,
  to_char(rq.question_discover_date, 'yyyy-MM-dd') questionDiscoverDate,
  rq.aud_project_type audProjectType,
  (
    select d.dept_name
    from d
    where d.dept_id = rq.check_dept
  ) checkDept,
  (
    select to_char(wm_concat(distinct(k.org_name)))
    from o, k
    where o.question_id = rq.question_id
      and o.ASC_ORG = k.org_id
      and o.REFORM_TYPE = '0'
  ) ascOrg,
  (
    select to_char(wm_concat(distinct(k.dept_name)))
    from o, fnd_dept_t k
    where o.question_id = rq.question_id
      and o.MAIN_REV_DEPT = k.dept_id
      and o.REFORM_TYPE = '0'
  ) mainRevDept,
  (
    select e.name
    from e
    where e.category_code = 'REV_FINISH_STATE'
      and e.code = rq.rev_finish_state
  ) revFinishState,
  to_char(rq.compliance_date, 'yyyy-MM-dd') complianceDATE
from rq
  left join REM_QUESTION_PLAN_T t on rq.question_id = t.question_id
  left join fnd_org_t org on t.ASC_ORG = org.org_id
where 1 = 1
  and rq.asc_org is null
  and (
    t.asc_org in (
      select f.org_id
      from f
      where f.org_type = 'G'
    )
    or rq.created_by_org in (
      select f.org_id
      from f
      where f.org_type = 'G'
    )
  )
  and rq.company_type = 'G';

2 Analysis Process

The execution plan is shown below:

===========================================================
|ID|OPERATOR          |NAME               |EST. ROWS|COST    |
-----------------------------------------------------------
|0 |SUBPLAN FILTER    |                    |6283     |788388847|
|1 | SUBPLAN FILTER   |                    |6283     |1325483 |
|2 |  HASH OUTER JOIN |                    |8377     |210530  |
|3 |   TABLE SCAN      |RQ                 |7966     |77932   |
|4 |   TABLE SCAN      |T                  |152919   |59150   |
|5 |  TABLE SCAN      |F                  |440      |2763    |
|6 |  TABLE SCAN      |F                  |440      |2763    |
|7 | TABLE SCAN        |E(SYS_C0011218)   |1        |92      |
|8 | TABLE SCAN        |E(SYS_C0011218)   |1        |92      |
|9 | TABLE GET         |D                  |1        |46      |
|10| SCALAR GROUP BY  |                    |1        |62483   |
|11|  NESTED-LOOP JOIN|                    |1        |62483   |
|12|   TABLE SCAN      |O                  |1        |62468   |
|13|   TABLE GET       |K                  |1        |28      |
|14| SCALAR GROUP BY  |                    |1        |62483   |
|15|  NESTED-LOOP JOIN|                    |1        |62483   |
|16|   TABLE SCAN      |O                  |1        |62468   |
|17|   TABLE GET       |K                  |1        |27      |
|18| TABLE SCAN        |E(SYS_C0011218)   |1        |92      |
===========================================================

Each sub‑operator has a low individual cost, but the total cost is huge. The SQL can be logically divided into two parts:

Scalar subquery – the sub‑queries in the projection list.

Outer query – the FROM clause with joins and additional sub‑queries.

The execution flow is therefore:

Execute the outer query first, producing result set r (operators 1‑6).

For each row of r , execute the scalar subqueries (operators 7‑18).

Repeat until all rows of r have been processed.

To pinpoint the bottleneck, the outer query (operators 1‑6) was run alone and returned 130,000 rows quickly, indicating that each scalar subquery is executed 130,000 times.

The scalar subqueries corresponding to operators 7, 8, 9 and 18 can use indexes and are fast; keeping only those reduces the total runtime dramatically.

The real problem lies with operators 10 and 14. Their subqueries join tables o and k inside the scalar subquery, causing the join to be performed 130,000 times.

The two problematic scalar subqueries are shown below (the highlighted part is the join that costs ~200 ms per execution):

select
  xxx,
  (
    select to_char(wm_concat(distinct(k.org_name)))
    from REM_QUESTION_PLAN_T o, fnd_org_t k
    where o.question_id = rq.question_id
      and o.ASC_ORG = k.org_id
      and o.REFORM_TYPE = '0'
  ) ascOrg,
  (
    select to_char(wm_concat(distinct(k.dept_name)))
    from REM_QUESTION_PLAN_T o, fnd_dept_t k
    where o.question_id = rq.question_id
      and o.MAIN_REV_DEPT = k.dept_id
      and o.REFORM_TYPE = '0'
  ) mainRevDept,
  xxx
from t   -- outer query, 130k rows

3 Conclusion

Scalar subqueries are executed as a SUBPLAN FILTER (equivalent to a NESTED‑LOOP JOIN ). Their performance depends on two factors:

The size of the outer query result set.

The efficiency of the subquery itself.

If the outer result set is large and the subquery contains additional joins without proper indexes, the nested‑loop execution becomes extremely slow. In such cases the subquery should be rewritten as a regular LEFT JOIN , allowing the optimizer to choose more efficient join algorithms such as HASH JOIN.

4 Optimization方案 (Optimization Plan)

Because the scalar subqueries contain aggregation, they should be aggregated first (using GROUP BY ) and then joined to the outer query. The rewritten SQL (partial) is:

with t1 as (
  select o.question_id,
         to_char(wm_concat(distinct(k.org_name))) as org_name
  from REM_QUESTION_PLAN_T o, fnd_org_t k
  where o.ASC_ORG = k.org_id
    and o.REFORM_TYPE = '0'
  group by o.question_id
),
 t2 as (
  select o.question_id,
         to_char(wm_concat(distinct(k.dept_name))) as dept_name
  from REM_QUESTION_PLAN_T o, fnd_dept_t k
  where o.MAIN_REV_DEPT = k.dept_id
    and o.REFORM_TYPE = '0'
  group by o.question_id
)
select xxx,
       t1.org_name as ascOrg,
       t2.dept_name as mainRevDept,
       xxx
from t   -- outer query, 130k rows
left join t1 on t.question_id = t1.question_id
left join t2 on t.question_id = t2.question_id;

After the rewrite the execution plan changes to use HASH OUTER JOIN . The total cost drops from 788 million to 3.6 million and the runtime falls from >1000 seconds to about 10 seconds.

=============================================================
|ID|OPERATOR          |NAME               |EST. ROWS|COST   |
-------------------------------------------------------------
|0 |SUBPLAN FILTER    |                    |6318     |3653489|
|1 | MERGE GROUP BY    |                    |6318     |1636701|
|2 |  SORT             |                    |6318     |1632074|
|3 |   SUBPLAN FILTER  |                    |6318     |1613799|
|4 |    HASH OUTER JOIN|                    |8424     |492531 |
|5 |     HASH OUTER JOIN|                   |8377     |331672 |
|6 |      MERGE OUTER JOIN|                |7966     |198317 |
|7 |        TABLE SCAN |RQ                 |7966     |77932  |
|8 |        SUBPLAN SCAN|T2                |2351     |119098 |
|9 |         MERGE GROUP BY|               |2351     |119062 |
|10|          SORT     |                    |2352     |118658 |
|11|           HASH JOIN|                    |2352     |113818 |
|12|            TABLE SCAN|K               |22268    |8614   |
|13|            TABLE SCAN|O               |76460    |60075  |
|14|          TABLE SCAN|T                |152919   |59150  |
|15|        SUBPLAN SCAN|T1               |76415    |118014 |
|16|         HASH JOIN |                    |76415    |116865 |
|17|          TABLE SCAN|K               |7033     |2721   |
|18|          TABLE SCAN|O               |76460    |60075  |
|19|          TABLE SCAN|F               |440      |2763   |
|20|          TABLE SCAN|F               |440      |2763   |
|21|          TABLE SCAN|E(SYS_C0011218) |1        |92     |
|22|          TABLE SCAN|E(SYS_C0011218) |1        |92     |
|23|          TABLE GET |D                |1        |46     |
|24|          TABLE SCAN|E(SYS_C0011218) |1        |92     |
=============================================================

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization SQL database OceanBase Scalar Subquery

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.