Debugging a GCC O3 Loop Vectorization Crash: Analysis, Root Cause, and Fix
A segmentation fault in a simple tile‑index copy loop was traced to GCC 9.2’s -O3 loop‑vectorization pass miscalculating structure offsets, and the issue was resolved by disabling -ftree-loop-vectorize or upgrading the compiler, as newer GCC versions and Clang handle the code correctly.
Background : A customer reported a segmentation fault that could be reproduced consistently. The crash appeared to be a simple use‑after‑free or out‑of‑bounds access, but deeper analysis was needed.
Initial Investigation : The team examined the code that copies data from an array of TileContentIndexStruct to an array of TileContentIndex . The copy function is a straightforward loop:
void* readTileContentIndexCallback(TileContentIndexStruct *tileIndexData, int32_t count) {
TileContentIndex* tileContentIndexList = new TileContentIndex[count];
for (int32_t index = 0; index < count; index++) {
TileContentIndexStruct &inData = tileIndexData[index];
TileContentIndex &outData = tileContentIndexList[index];
outData.urID = (uint16_t)inData.urCode;
outData.adcode = (uint32_t)inData.adcode;
outData.level = (uint16_t)inData.levelNumber;
outData.southWestTileId = (uint32_t)inData.southWestTileId;
outData.numRows = (uint16_t)inData.numRows;
outData.numColumns = (uint16_t)inData.numColumns;
outData.tileIndex = inData.tileContentIndex;
}
return tileContentIndexList;
}The code itself looks harmless, yet the crash persisted.
Discovery : A teammate noticed that the crash only appeared after changing the GCC optimization level from -O2 to -O3 . Reverting to -O2 eliminated the crash.
Exploring GCC Optimizations : The list of options enabled by -O3 (but not by -O2 ) includes many loop‑related flags such as -floop-interchange , -floop-unroll-and-jam , -ftree-loop-distribute , -funswitch-loops , and -ftree-loop-vectorize . By compiling a minimal demo with -O3 and examining the generated assembly, the team identified that the loop‑vectorization pass ( -ftree-loop-vectorize ) was responsible for the erroneous behavior.
Reproducing the Issue : Using the commands
g++ -O2 -S -o main2.s main.cpp
g++ -o main2 main2.sproduced a correct binary. Switching to -O3 with
g++ -O3 -S -o mainO3.s main.cpp
g++ -o mainO3 mainO3.sreproduced the crash.
Pinpointing the Faulty Optimization : Disabling -ftree-loop-vectorize while keeping -O3 fixed the problem:
g++ -O3 -fno-tree-loop-vectorize -S -o main_fixed.s main.cpp
g++ -o main_fixed main_fixed.sThe crash disappeared, confirming that the loop‑vectorization pass mis‑calculated the structure size (treating the 40‑byte TileContentIndexStruct as if it were 8 bytes) and generated incorrect offsets for the tileContentIndex field.
Assembly Deep‑Dive : The generated ARMv8‑A64 assembly shows the use of NEON instructions ( zip1 , ins , xtn , etc.) to process four elements per iteration. However, the vectorized code incorrectly accessed the source structure fields, leading to a pointer being interpreted as an integer length and ultimately causing a segmentation fault.
Patch and Verification : By manually correcting the offset calculations in the assembly (e.g., fixing the stride from 32 to 160 bytes and adjusting loop bounds), the binary runs without crashing. The fix was validated on the original GCC 9.2, while newer GCC 10.3 no longer applies the problematic vectorization, and Clang’s vectorizer works correctly (processing eight elements per iteration).
Conclusion : The root cause is a bug in GCC 9.2’s -ftree-loop-vectorize optimization for this specific pattern. The practical workaround for the SDK delivery team is to compile with -O3 but disable -ftree-loop-vectorize . Upgrading the compiler or using Clang also avoids the issue.
Amap Tech
Official Amap technology account showcasing all of Amap's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.