Databases 15 min read

OceanBase Query Optimizer: Challenges, Techniques, and Engineering Practices

This article examines the core challenges of query optimization in relational databases—accurate statistics, massive plan spaces, and efficient plan management—and explains how OceanBase addresses them through logical/physical row concepts, real‑time statistics, distributed two‑stage planning, adaptive caching, and plan evolution mechanisms.

AntTech

May 29, 2019

OceanBase Query Optimizer: Challenges, Techniques, and Engineering Practices

The query optimizer is the heart of a relational database system, responsible for enumerating equivalent execution plans, estimating their costs using statistics and a cost model, and selecting the plan with the lowest estimated resource consumption.

Key challenges remain: (1) obtaining precise statistics and a dynamic cost model, (2) handling the enormous search space of possible plans—especially with modern storage engines like LSM‑Tree and distributed architectures—and (3) managing plans efficiently through caching and evolution.

Challenge 1 – Precise Statistics and Cost Model: Traditional statistics suffer from sampling error and staleness, while static cost models cannot reflect runtime variations in CPU, I/O, and network usage.

Challenge 2 – Massive Plan Space: LSM‑Tree storage separates static baseline data from dynamic in‑memory data, causing the number of rows accessed during cost estimation to diverge from the logical row count, and distributed execution further multiplies the number of operator implementations and partitioning choices.

Challenge 3 – Efficient Plan Management: Effective caching (parameterized plans) and evolution (ensuring new plans do not cause performance regression) are essential for high‑concurrency, low‑latency workloads.

OceanBase tackles these issues with several engineering solutions:

It introduces the concepts of logical rows (traditional row count) and physical rows (actual rows read from LSM‑Tree), enabling real‑time statistics that combine baseline and delta data.

It resolves predicate dependency on multi‑column indexes by default, improving selectivity estimation.

For distributed optimization, OceanBase adopts a two‑stage approach: first generate an optimal local plan assuming all tables are local, then apply heuristic rules to choose distributed operator implementations, while a future one‑stage approach will enumerate all distributed alternatives with pruning.

The system uses a parameterized plan cache and an adaptive plan‑matching mechanism that monitors execution feedback and splits the selectivity space into multiple plans when needed.

Plan evolution is driven by real traffic rather than scheduled background jobs, allowing immediate adoption of better plans such as newly created indexes.

Overall, OceanBase’s optimizer is built to serve both OLTP and OLAP workloads, leveraging its share‑nothing distributed architecture, LSM‑Tree storage, and extensive plan management to deliver a robust, high‑performance query execution engine.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

statistics Query Optimization OceanBase Cost Model distributed planning LSM-Tree

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.