How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers
This guide details the migration of X86‑compiled C/C++ applications to Huawei TaiShan Aarch64 servers, covering language differences, required compiler versions, common build‑time errors, assembly rewrites, memory‑ordering quirks, floating‑point precision issues, and specific GCC flags to achieve correct and performant binaries.
Programming Language Overview
Compiled languages (e.g., C, C++) generate architecture‑specific machine code, so binaries built for X86 cannot run on TaiShan Aarch64 without recompilation. Interpreted languages (e.g., Java, Python) produce platform‑independent bytecode, allowing direct deployment on the new servers, though native libraries still need recompilation.
Preparation
Install a GCC toolchain version 7.3 or newer (minimum 4.8.5). Download links: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/ Installation instructions:
https://gcc.gnu.org/install/Compilation Issues and Fixes
1.1 -m64 Compilation Option
Problem: gcc: error: unrecognized command line option ‘-m64’ – the -m64 flag targets X86‑64 and is unsupported on ARM64.
Solution: Use the ARM64‑specific ABI flag -mabi=lp64 instead.
1.2 Signed‑char Mismatch
Problem: Warning about comparisons always false due to data‑type range; on X86 char defaults to signed, while on ARM64 it defaults to unsigned.
Solution: Add the compiler flag -fsigned-char to force signed char on ARM64.
2.1 Assembly Instruction Rewrite
Problem: Inline assembly written for X86 cannot be assembled on ARM64.
Solution: Rewrite the assembly sections using ARM64 intrinsics or GCC built‑ins. Example images illustrate the X86 version and the corresponding ARM64 implementation.
2.2 CRC32 Instruction Replacement
Problem: X86 uses crc32b / crc32q which are unknown on ARM64.
Solution: Replace with ARM64 equivalents crc32cb, crc32ch, crc32cw, crc32cx and compile with -mcpu=generic+crc. The table image shows the mapping.
2.3 BSWAP Replacement
Problem: X86 bswap instruction is not recognized on ARM64.
Solution: Use the ARM64 rev instruction; the before/after images demonstrate the change.
2.4 REP Instruction Replacement
Problem: X86 rep prefix is unsupported on ARM64.
Solution: Replace with ARM64 rept prefix; code snippets before and after the change are shown.
2.5 SSE/SSE2 Intrinsics Porting
Some modules rely on GCC‑provided SSE/SSE2 functions that lack ARM64 equivalents. Use the open‑source https://github.com/open-estuary/sse2neon.git project to obtain NEON‑based replacements. Steps:
Copy SSE2NEON.h into the project.
Remove existing SSE‑specific code sections (see image).
Include SSE2NEON.h in source files.
2.6 Weak Memory Ordering
ARM64 employs a weak memory model, which can cause unexpected results in lock‑free code. Ensure that atomic operations are aligned to their natural size and that appropriate memory‑barrier instructions are used. The diagrams illustrate cache line sharing and out‑of‑order execution effects.
2.7 Atomic Operations on Misaligned Structures
Problem: Atomic instructions like ldaxr / stlxr require naturally aligned addresses; misaligned structs cause crashes.
Solution: Search for #pragma pack usages, remove forced byte‑packing on structs that participate in atomic operations, and ensure proper alignment.
2.8 Hard‑Coded CPU Core Counts
Code that hard‑codes the number of CPU cores for thread affinity may under‑utilize TaiShan servers. Search for sched_setaffinity calls and replace static core numbers with sysconf(_SC_NPROCESSORS_CONF) to obtain the actual core count.
2.9 Floating‑Point to Integer Conversion Differences
Conversion of double to integer types behaves differently on X86 and TaiShan ARM64, especially for overflow/underflow cases. The X86 implementation may produce an “indefinite integer value,” while ARM64 clamps to the min/max representable value. Tables illustrate the conversion results for long, unsigned long, int, and unsigned int. Adjust conversion code accordingly.
Compiler Optimizations and Architecture‑Specific Flags
4.1 Floating‑Point Precision
GCC -O2 and above enable fused multiply‑add ( fmadd) which can change rounding behavior, causing up to 16‑digit differences between X86 and ARM64 results. Disable this with -ffp-contract=off.
4.2 Targeting Kunpeng Architecture
Add -march=armv8-a to generate code tuned for the Kunpeng processor family.
4.3 Pipeline Tuning
For GCC 9.1+, enable the TSV110 pipeline with -mtune=tsv110 to fully exploit Kunpeng’s instruction‑level parallelism.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
