Databases 21 min read

Fast CPU Performance Evaluation on Domestic Servers & OceanBase Optimization

This article explains why domestic Haiguang/Kunpeng servers lag behind Intel CPUs, presents a quick stored‑procedure method to benchmark CPU performance, details OceanBase tenant CPU specifications and tuning parameters, and offers concrete optimization techniques for high‑concurrency TP and massive‑data AP scenarios, including queuing tables, sequence cache, and index redesign.

ITPUB
ITPUB
ITPUB
Fast CPU Performance Evaluation on Domestic Servers & OceanBase Optimization

During the current database digital transformation, many enterprises adopt domestic Haiguang or Kunpeng servers, whose CPU performance is generally weaker than traditional Intel servers, especially in massive data processing and high‑concurrency workloads, creating noticeable performance bottlenecks.

Fast CPU Performance Evaluation Method

Because CPU performance on domestic servers cannot be significantly improved in the short term, the core solution is to reduce application CPU load. A quick benchmark can be performed with a stored procedure that repeatedly executes a CPU‑intensive power function and measures the elapsed time.

CREATE PROCEDURE test_cpu
IS
  test_power number;
  g_start number;
  second_run number;
BEGIN
  g_start := dbms_utility.get_time;
  FOR v IN 1..1000000 LOOP
    SELECT power(32,64) INTO test_power FROM dual;
  END LOOP;
  second_run := (dbms_utility.get_time - g_start);
  dbms_output.put_line('cpu耗时(厘秒):'||second_run);
END test_cpu;
/

Running this procedure on an Intel server took about 13.739 seconds, while a domestic server took about 25.921 seconds, giving a performance ratio of roughly 0.53. This demonstrates a significant CPU gap that must be addressed by application‑level optimizations.

OceanBase Tenant CPU Specification Design

OceanBase server capacity depends on model, core frequency, and core count, as well as SQL complexity, remote SQL, and distributed transaction volume. Capacity is measured by TPS, QPS, and RT. Typical guidelines are:

Read‑heavy workloads: 1000‑2000 QPS per logical CPU.

Update‑heavy workloads: 100‑200 TPS per logical CPU (assuming 10‑20 SQL statements per transaction).

To keep CPU usage below 70 %, a 64‑core x86 server can handle up to ~70 000 QPS or ~7 000 TPS. When a tenant exceeds these limits, application architecture or code must be optimized.

OceanBase tenant configuration includes max_cpu and min_cpu (maximum and minimum logical CPU cores). The actual logical cores used can exceed these limits because of the cpu_quota_concurrency parameter, which defaults to 4 and controls how many active threads a single logical core can run:

active_threads = unit_min_cpu * cpu_quota_concurrency

For example, a tenant with 10 logical cores and cpu_quota_concurrency=4 can run 40 active threads, reaching a maximum CPU usage of 400 %.

NUMA considerations: Intel servers typically disable NUMA, while domestic servers enable it. Enabling NUMA can cause a performance “knee” when the number of concurrent threads exceeds the physical core count; beyond that point, per‑thread performance drops sharply.

TP Scenario High‑Concurrency Optimization Ideas

High‑frequency SQL in massive‑concurrency systems are usually short queries that should use primary‑key lookups and avoid multi‑table joins. In a real‑world insurance claim processing system (≈20 000 daily cases, ~50 TB data), CPU usage averaged >80 % before optimization.

Key optimizations applied:

Queuing table conversion : Setting TABLE_MODE='queuing' enables automatic minor‑merge cleanup of deleted rows, reducing full‑table scans caused by large numbers of logically deleted records. ALTER TABLE user_table TABLE_MODE = 'queuing'; Sequence cache increase : Raising the cache from 20 to 2000 improves insert throughput by ~2.7×.

Index redesign : Adding a dedicated index on deleted_status and rewriting queries to filter on this column reduces scan rows dramatically.

create index status_idx on BPM_USER_TASKCOUNT_DETAIL(case deleted_status when '0' then '0' end);
WHERE status='0' and (case deleted_status when '0' then '0' end)='0'

Update statement refinement : Removing unnecessary columns from composite indexes and ensuring the most selective column is indexed first cuts full‑table scans.

UPDATE fssc_expensebillline_ods SET status='S' WHERE BATCHNO='...';

JDBC connection string tuning, PL object invalidation, and sequence cache adjustments further lowered average CPU load by ~25 %.

AP Scenario Massive Data‑Processing Optimization Ideas

AP workloads involve large data volumes, complex SQL, many sub‑queries and joins, and heavy PL usage. Migration from Oracle to OceanBase requires adapting to different execution characteristics.

Typical techniques:

Convert global indexes to local indexes, improving insert throughput by 20‑30 % for tables with tens of millions of rows.

Replace scalar sub‑queries with joins, as OceanBase executes joins more efficiently than scalar sub‑queries.

Leverage partition keys to eliminate partitions early, reducing data scanned.

Remove redundant GROUP BY clauses and redesign FOR‑loops to minimize nesting and move invariant logic outside the loop.

Identify and drop redundant indexes; for example, a composite index on (STATUS, BATCHNO) was reduced to a single‑column index on BATCHNO because the two columns are fully correlated.

Transform deep FOR loops (up to 7 levels) into WHILE loops, cutting memory consumption of the sqlexecutor module from 108 GB to roughly 14 GB (a 7.71× reduction).

Before‑after performance charts (illustrated below) show significant reductions in CPU usage and query latency after applying the above changes.

Optimization result chart
Optimization result chart

Conclusion

The article presented a rapid CPU performance evaluation method for domestic servers, detailed OceanBase tenant CPU configuration, and offered concrete optimization strategies for both high‑concurrency TP scenarios and massive‑data AP scenarios. By focusing on index design, queuing tables, sequence cache tuning, and loop restructuring, CPU load can be dramatically reduced, helping enterprises overcome the inherent performance limits of domestic hardware and supporting cost‑effective digital transformation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

high concurrencyDatabase OptimizationCPU performanceOceanBasequeuing tableindex redesignsequence cache
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.