Performance Degradation After Data Updates in OceanBase and Its Optimization Techniques
The article investigates why pure‑read QPS drops significantly after bulk updates in OceanBase, reproduces the issue with a sysbench workload, analyses flame‑graph and SQL audit data, explains the LSM‑Tree read‑amplification mechanism, and proposes practical mitigation steps such as major freeze, plan binding, index creation, and the queuing‑table feature.
Background – During a pure‑read performance test on OceanBase, the author observed a noticeable QPS decline after performing update operations. The article details the reproduction steps and the environment used for the experiment.
Reproduction Steps
Environment Preparation
Deploy OB
Deploy a single‑node OceanBase using OBD.
Version
IP
OceanBase 4.0.0.0 CE
10.186.16.122
Key parameters (default values) relevant to the experiment:
Parameter Name
Meaning
Default
memstore_limit_percentagePercentage of tenant memory used by memstore
50
freeze_trigger_percentageMemory threshold that triggers a global freeze
20
major_compact_triggerNumber of minor merges before a major merge
0
minor_compact_triggerThreshold of Mini SSTable count that triggers a minor merge
2
Create sysbench tenant
create resource unit sysbench_unit max_cpu 26, memory_size '21g';
create resource pool sysbench_pool unit = 'sysbench_unit', unit_num = 1, zone_list=('zone1');
create tenant sysbench_tenant resource_pool_list=('sysbench_pool'), charset=utf8mb4, zone_list=('zone1'), primary_zone=RANDOM set variables ob_compatibility_mode='mysql', ob_tcp_invited_nodes='%';Data preparation
Create 30 tables, each with 1,000,000 rows:
sysbench ./oltp_read_only.lua --mysql-host=10.186.16.122 --mysql-port=12881 --mysql-db=sysbenchdb --mysql-user="sysbench@sysbench_tenant" --mysql-password=sysbench --tables=30 --table_size=1000000 --threads=256 --time=60 --report-interval=10 --db-driver=mysql --db-ps-mode=disable --skip-trx=on --mysql-ignore-errors=6002,6004,4012,2013,4016,1062 prepareEnvironment tuning – manually trigger a major merge
ALTER SYSTEM MAJOR FREEZE TENANT=ALL;
-- View merge progress
SELECT * FROM oceanbase.CDB_OB_ZONE_MAJOR_COMPACTION\GPure‑read QPS before data updates
sysbench ./oltp_read_only.lua ... runFive runs produced QPS values around 340k.
Pure‑read QPS after write‑only workload (updates/deletes/inserts)
sysbench ./oltp_write_only.lua ... runFive runs dropped to roughly 170k, a ~50% reduction.
Pure‑read QPS after a manual major merge
ALTER SYSTEM MAJOR FREEZE TENANT=ALL;QPS recovered to the original level (≈330k‑350k).
Phenomenon Summary
Bulk updates cause a ~170k QPS drop, which is restored after a major merge.
Investigation Process
Method 1 – Flame Graph
Flame‑graph comparison before and after updates shows additional hot spots in oceanbase::blocksstable::ObMultiVersionMicroBlockRowScanner::inner_get_next_row , indicating heavy multi‑version row processing.
Further source inspection reveals the function ObMultiVersionMicroBlockRowScanner::inner_get_next_row_impl performs a while‑loop that reads and merges multiple versions of a row, invoking do_compact and fuse_row . The code snippet is reproduced below:
int ObMultiVersionMicroBlockRowScanner::inner_get_next_row_impl(const ObDatumRow *&ret_row)
{
int ret = OB_SUCCESS;
bool final_result = false;
bool found_first_row = false;
bool have_uncommited_row = false;
const ObDatumRow *multi_version_row = NULL;
ret_row = NULL;
while (OB_SUCC(ret)) {
// locate cursor
if (OB_FAIL(locate_cursor_to_read(found_first_row))) {
if (OB_UNLIKELY(OB_ITER_END != ret)) {
LOG_WARN("failed to locate cursor to read", K(ret));
}
}
while (OB_SUCC(ret)) {
multi_version_row = NULL;
bool version_fit = false;
if (read_row_direct_flag_) {
if (OB_FAIL(inner_get_next_row_directly(multi_version_row, version_fit, final_result))) {
if (OB_UNLIKELY(OB_ITER_END != ret)) {
LOG_WARN("failed to inner get next row directly", K(ret));
}
}
} else if (OB_FAIL(inner_inner_get_next_row(multi_version_row, version_fit, final_result, have_uncommited_row))) {
if (OB_UNLIKELY(OB_ITER_END != ret)) {
LOG_WARN("failed to inner get next row", K(ret));
}
}
if (OB_SUCC(ret)) {
if (!version_fit) {
// do nothing
} else if (OB_FAIL(do_compact(multi_version_row, row_, final_result))) {
LOG_WARN("failed to do compact", K(ret));
} else {
if (OB_NOT_NULL(context_)) {
++context_->table_store_stat_.physical_read_cnt_;
}
if (have_uncommited_row) {
row_.set_have_uncommited_row();
}
}
if ((OB_SUCC(ret) && final_result) || OB_ITER_END == ret) {
ret = OB_SUCCESS;
if (OB_FAIL(cache_cur_micro_row(found_first_row, final_result))) {
LOG_WARN("failed to cache cur micro row", K(ret));
}
break;
}
}
}
if (OB_SUCC(ret) && finish_scanning_cur_rowkey_) {
if (!is_row_empty(prev_micro_row_)) {
ret_row = &prev_micro_row_;
} else if (!is_row_empty(row_)) {
ret_row = &row_;
}
if (NULL != ret_row) {
(const_cast
(ret_row))->mvcc_row_flag_.set_uncommitted_row(false);
const_cast
(ret_row)->trans_id_.reset();
break;
}
}
}
if (OB_NOT_NULL(ret_row)) {
if (!ret_row->is_valid()) {
LOG_ERROR("row is invalid", KPC(ret_row));
} else {
LOG_DEBUG("row is valid", KPC(ret_row));
if (OB_NOT_NULL(context_)) {
++context_->table_store_stat_.logical_read_cnt_;
}
}
}
return ret;
}Method 2 – SQL Execution Audit
Using GV$OB_SQL_AUDIT , the author compared execution details of the same SELECT before and after the update workload. The audit shows increased MEMSTORE_READ_ROW_COUNT and SSSTORE_READ_ROW_COUNT (total ~233 rows read for a query that returns only 100 rows), confirming read‑amplification.
Plan analysis via V$OB_PLAN_CACHE_PLAN_EXPLAIN reveals the query uses a primary‑key index with a PHY_TABLE_SCAN followed by a PHY_SORT . The plan is already optimal, so the slowdown originates from the storage engine rather than the optimizer.
Conclusion
OceanBase’s LSM‑Tree storage stores data in a baseline plus incremental fashion. After massive insert/delete/update operations, a single logical row may have many physical versions spread across MemTables and SSTables. Reading such a row requires scanning and merging all versions, leading to significant read amplification and QPS degradation.
Performance Improvement Methods
For cases where a usable index exists but the optimizer still chooses a full‑table scan, use plan binding to force the optimal plan.
If the main filter column lacks an index, create an appropriate index online and bind the plan.
When indexes cannot be added or the workload is range‑scan heavy, manually trigger a major merge to clean up obsolete versions, reducing the amount of data scanned.
From OceanBase 2.2.7 onward, the “buffer minor merge” feature (setting table mode to 'queuing' ) can automatically handle queuing‑type tables, eliminating ineffective scans. Note that community edition 4.0 no longer supports explicit queuing tables, but the functionality is integrated adaptively in later 4.x releases.
References
"Queuing Table Slow Query Issue" – https://www.oceanbase.com/docs/enterprise-oceanbase-database-cn-10000000000945692
"Large‑Scale Data Processing Slow Access Problem Handling" – https://ask.oceanbase.com/t/topic/35602375
"OceanBase Queuing Table (buffer table) Best Practice" – https://open.oceanbase.com/blog/2239494400
"Is Queuing Table Unsupported in OB 4.0?" – https://ask.oceanbase.com/t/topic/35601606
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.