Boosting GaussDB Performance: Inside Huawei’s BiSheng Compiler Optimizations
The article explains how Huawei's BiSheng compiler enhances GaussDB performance through architecture‑level, module‑level, and function‑level optimizations such as inline expansion, instruction prefetch, auto‑vectorization, link‑time optimization, and feedback‑guided optimizations, and outlines future development plans.
Application performance can be improved at architecture, module, and function levels. GaussDB emphasizes high performance, and the BiSheng compiler provides extensive optimizations to further boost it.
What is the BiSheng Compiler? BiSheng is Huawei Compiler Lab’s toolchain for general‑purpose processors, supporting C/C++/Fortran, with deep optimizations for the Kunpeng architecture that deliver up to 30% higher SPEC CPU 2017 scores than open‑source GCC.
High Performance : deep compilation optimizations, enhanced multi‑core parallelism, automatic vectorization, and increased instruction and data throughput.
Multi‑Architecture Support : supports domestic Arm chips (e.g., Feiteng) and other architectures such as x86, RISC‑V, LoongArch.
High Reliability : extensive test suites, daily >1M test cases, security coding tools, timely CVE fixes, and automotive‑grade safety certification.
2.1 Function Inlining (inline)
Function calls incur overhead; inlining replaces calls with the function body, reducing this cost. Example before inlining:
int square(int x) { return x * x; }</code><code>int calculate(int a) { return square(a) + square(a + 1); }After automatic inlining by the compiler:
int calculate(int a) { return (a * a) + ((a + 1) * (a + 1)); }Inlining eliminates the call overhead and enables further compile‑time optimizations.
2.2 Instruction Prefetch Optimization
Instruction prefetching loads future instructions into cache based on predicted execution paths, reducing latency. The BiSheng compiler inserts the prfm prefetch instruction on Kunpeng platforms, as shown in the GaussDB example.
2.3 Automatic Vectorization
The compiler leverages SIMD instructions (Arm NEON/SVE, x86 SSE/AVX) to process multiple data elements per instruction. Vectorization can reduce four separate operations to a single SIMD instruction, dramatically improving throughput.
2.4 Link‑Time Optimization (LTO)
LTO merges all compilation units at link time, enabling cross‑module optimizations such as additional inlining, function specialization, dead‑code elimination, and constant propagation, at the cost of longer compile times.
2.5 Feedback‑Guided Optimization (CFGO)
For workloads with heavy control flow and data‑segment access (e.g., databases), CFGO collects runtime profiles to guide more accurate optimization decisions, improving generated code quality.
Performance Gains
BiSheng compiler, combined with GaussDB, delivers up to 30% TPCC and 13% TPCH performance improvements, and overall application speedups of 5‑10%.
Conclusion and Outlook
Compilers and databases are foundational software for critical industries. Future work includes completing the GaussDB migration to BiSheng, joint innovation for typical customer scenarios, and breakthroughs in compiler/VM technologies to further boost PL/SQL performance.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
