How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers

This guide details the migration of X86‑compiled C/C++ applications to Huawei TaiShan Aarch64 servers, covering language differences, required compiler versions, common build‑time errors, assembly rewrites, memory‑ordering quirks, floating‑point precision issues, and specific GCC flags to achieve correct and performant binaries.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers

Programming Language Overview

Compiled languages (e.g., C, C++) generate architecture‑specific machine code, so binaries built for X86 cannot run on TaiShan Aarch64 without recompilation. Interpreted languages (e.g., Java, Python) produce platform‑independent bytecode, allowing direct deployment on the new servers, though native libraries still need recompilation.

Preparation

Install a GCC toolchain version 7.3 or newer (minimum 4.8.5). Download links: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/ Installation instructions:

https://gcc.gnu.org/install/

Compilation Issues and Fixes

1.1 -m64 Compilation Option

Problem: gcc: error: unrecognized command line option ‘-m64’ – the -m64 flag targets X86‑64 and is unsupported on ARM64.

Solution: Use the ARM64‑specific ABI flag -mabi=lp64 instead.

1.2 Signed‑char Mismatch

Problem: Warning about comparisons always false due to data‑type range; on X86 char defaults to signed, while on ARM64 it defaults to unsigned.

Solution: Add the compiler flag -fsigned-char to force signed char on ARM64.

2.1 Assembly Instruction Rewrite

Problem: Inline assembly written for X86 cannot be assembled on ARM64.

Solution: Rewrite the assembly sections using ARM64 intrinsics or GCC built‑ins. Example images illustrate the X86 version and the corresponding ARM64 implementation.

2.2 CRC32 Instruction Replacement

Problem: X86 uses crc32b / crc32q which are unknown on ARM64.

Solution: Replace with ARM64 equivalents crc32cb, crc32ch, crc32cw, crc32cx and compile with -mcpu=generic+crc. The table image shows the mapping.

2.3 BSWAP Replacement

Problem: X86 bswap instruction is not recognized on ARM64.

Solution: Use the ARM64 rev instruction; the before/after images demonstrate the change.

2.4 REP Instruction Replacement

Problem: X86 rep prefix is unsupported on ARM64.

Solution: Replace with ARM64 rept prefix; code snippets before and after the change are shown.

2.5 SSE/SSE2 Intrinsics Porting

Some modules rely on GCC‑provided SSE/SSE2 functions that lack ARM64 equivalents. Use the open‑source https://github.com/open-estuary/sse2neon.git project to obtain NEON‑based replacements. Steps:

Copy SSE2NEON.h into the project.

Remove existing SSE‑specific code sections (see image).

Include SSE2NEON.h in source files.

2.6 Weak Memory Ordering

ARM64 employs a weak memory model, which can cause unexpected results in lock‑free code. Ensure that atomic operations are aligned to their natural size and that appropriate memory‑barrier instructions are used. The diagrams illustrate cache line sharing and out‑of‑order execution effects.

2.7 Atomic Operations on Misaligned Structures

Problem: Atomic instructions like ldaxr / stlxr require naturally aligned addresses; misaligned structs cause crashes.

Solution: Search for #pragma pack usages, remove forced byte‑packing on structs that participate in atomic operations, and ensure proper alignment.

2.8 Hard‑Coded CPU Core Counts

Code that hard‑codes the number of CPU cores for thread affinity may under‑utilize TaiShan servers. Search for sched_setaffinity calls and replace static core numbers with sysconf(_SC_NPROCESSORS_CONF) to obtain the actual core count.

2.9 Floating‑Point to Integer Conversion Differences

Conversion of double to integer types behaves differently on X86 and TaiShan ARM64, especially for overflow/underflow cases. The X86 implementation may produce an “indefinite integer value,” while ARM64 clamps to the min/max representable value. Tables illustrate the conversion results for long, unsigned long, int, and unsigned int. Adjust conversion code accordingly.

Compiler Optimizations and Architecture‑Specific Flags

4.1 Floating‑Point Precision

GCC -O2 and above enable fused multiply‑add ( fmadd) which can change rounding behavior, causing up to 16‑digit differences between X86 and ARM64 results. Disable this with -ffp-contract=off.

4.2 Targeting Kunpeng Architecture

Add -march=armv8-a to generate code tuned for the Kunpeng processor family.

4.3 Pipeline Tuning

For GCC 9.1+, enable the TSV110 pipeline with -mtune=tsv110 to fully exploit Kunpeng’s instruction‑level parallelism.

code migrationGCCaarch64memory orderingCompiler Flagsfloating point precisionassembly rewrite
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.