Fundamentals 17 min read

Code Coverage Instrumentation Using LLVM and GCOV

The article explains how to instrument code for coverage using LLVM’s modular compiler infrastructure—adding compile‑time probes that generate .gcno files, runtime counters that produce .gcda files, and then parsing these with GCOV, while detailing LLVM IR, basic blocks, flow graphs, and extensions for custom plugins.

Amap Tech
Amap Tech
Amap Tech
Code Coverage Instrumentation Using LLVM and GCOV

Background

With rapid business growth, the complexity of code logic increases. The quality of QA testing becomes crucial for product stability after release. QA testing generally consists of two major workflows: automated testing and manual testing. Both require code coverage information. While automated testing coverage solutions are mature on both client and server sides, this article focuses on obtaining code coverage during manual testing, which mainly relies on code instrumentation.

Instrumentation Process

The diagram shows key nodes and technologies on both client and server sides. The article concentrates on the compilation stage, where the instrumentation package is generated by inserting probes into the IR (Intermediate Representation) files.

Compilation Stage

During compilation, a special compiler option is added so that each executable generates a corresponding .gcno file.

Runtime Stage

When the instrumented binary runs, coverage distribution functions create .gcda files that record execution counts.

Parsing Stage

The generated binary coverage files are then visualized.

Compiler Internals

The core operation in the compilation stage is instrumenting the IR file. An IR file is the intermediate representation produced by the compiler front‑end and consumed by the back‑end.

Language Processing System

A typical language processing pipeline consists of a front‑end, optimizer, and back‑end. Traditional compilers tightly couple front‑end and back‑end, making it hard to support new languages or hardware. LLVM provides a modular, reusable compiler suite that separates these concerns.

LLVM

Website: http://www.aosabook.org/en/llvm.html

LLVM is an open‑source collection of modular compiler and toolchain technologies. It can compile many languages (Kotlin, Ruby, Python, Haskell, Java, D, PHP, Lua, etc.) and offers both AOT and JIT compilation. In 2012, LLVM received the ACM Software System Award.

Unlike traditional compilers, LLVM’s front‑end produces a language‑agnostic IR, enabling easy addition of new language front‑ends.

iOS & macOS Platform Compilers

Xcode switched from GCC to LLVM in version 5, making LLVM the default compiler for iOS and macOS development. Swift shares most modules with Objective‑C, differing mainly in the front‑end.

Clang

Clang is LLVM’s C, C++, and Objective‑C front‑end. It produces an abstract syntax tree (AST), offers fast compilation, low memory usage, and strong diagnostics. Compared to GCC, Clang compiles Objective‑C code about three times faster and uses roughly one‑fifth of the memory.

LLVM IR

LLVM Intermediate Representation (IR) is the bridge between the front‑end and back‑end. It is stored in .ll (human‑readable) or .bc (bitcode) files. IR is architecture‑independent, enabling powerful optimizations.

Three‑Address Code

Three‑address code (TAC) expresses operations as x = y op z . It simplifies code generation and optimization.

// Example: translating x+y*z into three‑address code
t1 = y * z
t2 = x + t1
// Loop example
i = i + 1
t1 = a[i]
if t1 < 10 goto 6
// Temporary variables t1, t2 are generated by the compiler

Basic Blocks

A basic block is a maximal sequence of consecutive three‑address instructions with a single entry point and no internal jumps.

Control can only enter at the first instruction.

Except for the last instruction, control does not exit the block.

If the first instruction executes, all instructions in the block execute.

The algorithm to form basic blocks:

Identify leader instructions: the first instruction, any target of a jump, and any instruction immediately following a jump.

Each leader starts a new basic block that extends up to (but not including) the next leader or the end of the code.

Example:

i = 1               // leader
j = 1               // leader (target of a jump)
t1 = 10*i
t2 = t1 + j
t3 = 8*t2
t4 = t3 - 88
a[t4] = 0.0
j = j + 1
if j <= 10 goto 6 // leader (jump)
i = i + 1
if i <= 10 goto 2 // leader (jump)
i = 1
t5 = i - 1          // leader (target)
t6 = 88*t5
a[t6] = 1.0
i = i + 1
if i <= 10 goto 13 // leader (jump)
// Generate a 10×10 identity matrix
for(i=1;i<=10;i++) {
for(j=1;j<=10;j++) {
a[i,j] = 0.0;
}
}
for(i=1;i<=10;i++) {
a[i,j] = 1.0;
}

The resulting basic blocks are visualized in the following flow graph:

Flow Graph

A flow graph represents basic blocks as nodes and possible control‑flow transfers as edges. Entry and exit nodes are added to model program start and termination.

Instrumentation Logic

Coverage counters are inserted in two nested loops: the outer loop iterates over functions, the inner loop iterates over each function’s basic blocks. For each basic block, an array ctr[n] (where n is the number of successors) records execution counts.

GCOV, GCNO, GCDA

GCOV is GNU’s coverage testing tool that works with GCC to provide statement and branch coverage for C/C++ code.

GCNO files are generated at compile time (with -fprofile-arcs ) and contain instrumentation data inserted into the IR.

GCDA files are produced at runtime and record the actual execution counts of arcs. The gcov utility can dump these files into a human‑readable format.

By combining the information from GCNO and GCDA, a detailed coverage report can be generated using various front‑end tools.

Technical Extensions

With the knowledge presented, developers can customize the instrumentation logic, write Xcode compiler plugins, or even create new languages by extending LLVM’s infrastructure.

References

Source code: https://github.com/llvm-mirror/llvm/blob/release_70/lib/Transforms/Instrumentation/GCOVProfiling.cpp

LLVM BasicBlock documentation: https://llvm.org/doxygen/group__LLVMCCoreValueBasicBlock.html#ga444a4024b92a990e9ab311c336e74633

GCOV manual: https://gcc.gnu.org/onlinedocs/gcc/Gcov.html

Code CoverageInstrumentationCompilerLLVMgcovIR
Amap Tech
Written by

Amap Tech

Official Amap technology account showcasing all of Amap's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.