Databases 11 min read

Why Scalar Subqueries Are Slow in OceanBase and How to Optimize Them

This article analyzes why a scalar subquery in an OceanBase 3.2.3.3 SQL statement runs over 1000 seconds, explains the execution plan details, and presents a rewrite using WITH and LEFT JOIN to replace SUBPLAN FILTER, reducing cost from 788 million to 3.6 million and execution time to 10 seconds.

Aikesheng Open Source Community

Jul 24, 2024

Why Scalar Subqueries Are Slow in OceanBase and How to Optimize Them

1 Problem Description

Database version: OceanBase 3.2.3.3. The following SQL runs for more than 1000 seconds. This example is used to discuss the reasons why scalar subqueries are slow and how to optimize them.

select
  rq.processinstid processinstid,
  rq.question_id questionId,
  rq.question_no questionNo,
  to_char(rq.rev_start_date, 'yyyy-MM-dd') revStartDate,
  (
    select e.name
    from e
    where e.category_code = 'REV_SOURCE'
      and e.code = rq.rev_source
  ) revSource,
  (
    select e.name
    from e
    where e.category_code = 'QUESTION_TYPE'
      and e.code = rq.question_type
  ) questionType,
  rq.question_summary questionSummary,
  rq.question_desc questionDesc,
  to_char(rq.question_discover_date, 'yyyy-MM-dd') questionDiscoverDate,
  rq.aud_project_type audProjectType,
  (
    select d.dept_name
    from d
    where d.dept_id = rq.check_dept
  ) checkDept,
  (
    select to_char(wm_concat(distinct(k.org_name)))
    from o, k
    where o.question_id = rq.question_id
      and o.ASC_ORG = k.org_id
      and o.REFORM_TYPE = '0'
  ) ascOrg,
  (
    select to_char(wm_concat(distinct(k.dept_name)))
    from o, fnd_dept_t k
    where o.question_id = rq.question_id
      and o.MAIN_REV_DEPT = k.dept_id
      and o.REFORM_TYPE = '0'
  ) mainRevDept,
  (
    select e.name
    from e
    where e.category_code = 'REV_FINISH_STATE'
      and e.code = rq.rev_finish_state
  ) revFinishState,
  to_char(rq.compliance_date, 'yyyy-MM-dd') complianceDATE
from
  rq
  left join REM_QUESTION_PLAN_T t on rq.question_id = t.question_id
  left join fnd_org_t org on t.ASC_ORG = org.org_id
where
  1 = 1
  and rq.asc_org is null
  and (
    t.asc_org in (
      select f.org_id
      from f
      where f.org_type = 'G'
    )
    or rq.created_by_org in (
      select f.org_id
      from f
      where f.org_type = 'G'
    )
  )
  and rq.company_type = 'G';

2 Analysis Process

The execution plan shows low cost per operator but a huge total cost. The SQL can be split into two logical parts: the scalar subqueries in the projection and the outer query (the FROM clause with joins and subqueries).

The execution logic is:

Execute the outer query first, producing result set r (operators 1‑6).

For each row of r , execute the scalar subqueries (operators 7‑18).

Repeat until all rows are processed.

Running the outer query alone returns about 130 000 rows quickly, which means the scalar subqueries are executed 130 000 times.

Operators 10 and 14 correspond to two scalar subqueries that join tables o and k inside the subquery, causing 130 000 joins and huge latency.

select
  xxx,
  (
    select to_char(wm_concat(distinct(k.org_name)))
    from REM_QUESTION_PLAN_T o, fnd_org_t k
    where o.question_id = rq.question_id
      and o.ASC_ORG = k.org_id
      and o.REFORM_TYPE = '0'
  ) ascOrg,
  (
    select to_char(wm_concat(distinct(k.dept_name)))
    from REM_QUESTION_PLAN_T o, fnd_dept_t k
    where o.question_id = rq.question_id
      and o.MAIN_REV_DEPT = k.dept_id
      and o.REFORM_TYPE = '0'
  ) mainRevDept,
  xxx
from t -- outer query, about 130k rows;

3 Conclusion

Scalar subqueries are implemented as SUBPLAN FILTER (equivalent to a NESTED‑LOOP JOIN ). Their performance depends on two factors: the size of the outer result set and the efficiency of the subquery itself. When the outer set is large and the subquery contains additional joins, the cost grows dramatically.

The only effective way to improve such a query is to rewrite the scalar subqueries as LEFT JOIN s, allowing the optimizer to choose more efficient join algorithms such as HASH JOIN.

4 Optimization Solution

First aggregate the subquery results with GROUP BY, then join them to the outer query. The rewritten SQL uses two CTEs ( t1 and t2) to pre‑compute the aggregated values.

with t1 as (
  select o.question_id,
         to_char(wm_concat(distinct(k.org_name))) as org_name
  from REM_QUESTION_PLAN_T o, fnd_org_t k
  where o.ASC_ORG = k.org_id
    and o.REFORM_TYPE = '0'
  group by o.question_id
),
 t2 as (
  select o.question_id,
         to_char(wm_concat(distinct(k.dept_name))) as dept_name
  from REM_QUESTION_PLAN_T o, fnd_dept_t k
  where o.MAIN_REV_DEPT = k.dept_id
    and o.REFORM_TYPE = '0'
  group by o.question_id
)
select xxx,
       t1.org_name as ascOrg,
       t2.dept_name as mainRevDept,
       xxx
from t -- outer query, about 130k rows
left join t1 on t.question_id = t1.question_id
left join t2 on t.question_id = t2.question_id;

The new execution plan uses HASH OUTER JOIN , reducing the estimated cost from 788 million to 3.65 million and the actual execution time from over 1000 seconds to about 10 seconds.

=============================================================
|ID|OPERATOR          |NAME               |EST. ROWS|COST   |
-------------------------------------------------------------
|0 |SUBPLAN FILTER    |                    |6318     |3653489|
|1 | MERGE GROUP BY   |                    |6318     |1636701|
|2 |  SORT            |                    |6318     |1632074|
|3 |   SUBPLAN FILTER |                    |6318     |1613799|
|4 |    HASH OUTER JOIN|                    |8424     |492531 |
|5 |     HASH OUTER JOIN|                    |8377     |331672 |
|6 |      MERGE OUTER JOIN|                |7966     |198317 |
|7 |       TABLE SCAN  |RQ                 |7966     |77932  |
|8 |       SUBPLAN SCAN|T2                 |2351     |119098 |
|9 |        MERGE GROUP BY|                |2351     |119062 |
|10|         SORT      |                    |2352     |118658 |
|11|          HASH JOIN|                    |2352     |113818 |
|12|           TABLE SCAN|K               |22268    |8614   |
|13|           TABLE SCAN|O               |76460    |60075  |
|14|        TABLE SCAN |T                 |152919   |59150  |
|15|      SUBPLAN SCAN |T1                |76415    |118014 |
|16|       HASH JOIN   |                    |76415    |116865 |
|17|        TABLE SCAN |K                 |7033     |2721   |
|18|        TABLE SCAN |O                 |76460    |60075  |
|19|      TABLE SCAN   |F                 |440      |2763   |
|20|      TABLE SCAN   |F                 |440      |2763   |
|21| TABLE SCAN        |E(SYS_C0011218)   |1        |92     |
|22| TABLE SCAN        |E(SYS_C0011218)   |1        |92     |
|23| TABLE GET         |D                 |1        |46     |
|24| TABLE SCAN        |E(SYS_C0011218)   |1        |92     |
=============================================================

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sql-optimization Database Performance OceanBase Scalar Subquery

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.