Backend Development 26 min read

How ByteDance Tackled C++ Compilation Bottlenecks and Massive Binary Bloat

ByteDance's STE team dissected the severe compile‑time delays and oversized binary artifacts in their data‑center C++ applications, presenting root‑cause analyses, LLVM bug fixes, and a suite of optimization techniques that together cut build times by up to 50% and reduced binary size by over 80%.

ByteDance SYS Tech
ByteDance SYS Tech
ByteDance SYS Tech
How ByteDance Tackled C++ Compilation Bottlenecks and Massive Binary Bloat

Background

ByteDance's data‑center C++ applications have grown in complexity, exposing two critical problems for the C++ compilation toolchain: long compilation times and excessively large binary artifacts. The STE team investigated the root causes and implemented solutions that dramatically reduced build time and binary size.

C++ Compilation Toolchain Overview

The typical toolchain consists of compiler (clang, gcc), linker (ld, gold, lld), debugger (gdb, lldb), binutils, high‑performance libraries, the STL, libc, and runtime libraries such as ASAN, unwind, and coverage instrumentation.

Challenges Faced

Large data‑center services experience compilation tail‑latency up to 60 minutes and binary sizes up to 10 GB. Specific challenges include:

Performance pressure on every toolchain component when combined with FDO/ThinLTO, ASAN, and coverage.

Upgrade difficulties due to differing compiler handling of undefined behavior, ABI incompatibilities, and multi‑architecture support.

Complexity of advanced optimizations such as ThinLTO, AutoFDO, and Propeller.

Impact of instrumentation tools (coverage, sanitizers, PGO, XRay) on binary size.

Case Study 1: GVN Bottleneck in clang 11 / gcc 8

Using -ftime-trace the team identified Global Value Numbering (GVN) consuming 1637 s in the clang 11/gcc 8 configuration, while the same pass took only 177 s with clang 16/gcc 13. The root cause was heavy use of std::char_traits::length() on constant strings, which triggered an in‑line __constant_string_p loop.

Back‑porting a GCC‑9.1 patch eliminated the loop and reduced compilation time by roughly 50 %.

Case Study 2: AutoFDO + ThinLTO

Enabling AutoFDO together with ThinLTO caused BranchProbabilityAnalysis() to dominate compile time (≈ 2/3 of total, > 100 min) due to a massive increase in basic blocks. The team contributed a cache for loop‑exit blocks to LLVM ( PR 93451 ), cutting compile time to 17.4 % of the original.

Large Binary Challenges

Binary sizes exceeding several gigabytes lead to relocation overflow errors in the linker and DWARF debug sections. The article explains the mechanics of relocation overflow and shows how excessive .text, .rodata, .bss, and .debug sections trigger out‑of‑range symbol references.

Optimization Strategies

Adjust compiler flags to shrink debug sections and enable size‑optimizing options (e.g., -Os , -Oz , -flto , -fdata‑sections , -ffunction‑sections , --gc‑sections ).

Adopt Split DWARF to separate debug information into DWO files.

Patch llvm‑dwp to handle large .debug_info sections correctly.

Disable whole‑archive linking for rarely used libraries.

Use outline instrumentation for sanitizers ( -fsanitize-address-outline-instrumentation ) and disable global variable instrumentation.

Perform code‑base cleanup guided by coverage tools.

Community Contributions

The STE team back‑ported several LLVM bugs and performance patches, including PR 88477, PR 93451, PR 96188, and PR 95771, making the fixes available to the open‑source community.

Future Directions

Ongoing work includes exploring DWARF5/64, extending BOLT/Propeller/AutoFDO integration, improving sanitizer reporting with LLMs, and leveraging ClangIR for deeper static analysis.

PerformanceOptimizationCompilationC++LLVMToolchain
ByteDance SYS Tech
Written by

ByteDance SYS Tech

Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.