Backend Development 14 min read

How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers

This guide details the migration of X86‑compiled C/C++ applications to Huawei TaiShan Aarch64 servers, covering language differences, required compiler versions, common build‑time errors, assembly rewrites, memory‑ordering quirks, floating‑point precision issues, and specific GCC flags to achieve correct and performant binaries.

Architects' Tech Alliance

Jan 21, 2020

How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers

Programming Language Overview

Compiled languages (e.g., C, C++) generate architecture‑specific machine code, so binaries built for X86 cannot run on TaiShan Aarch64 without recompilation. Interpreted languages (e.g., Java, Python) produce platform‑independent bytecode, allowing direct deployment on the new servers, though native libraries still need recompilation.

Preparation

Install a GCC toolchain version 7.3 or newer (minimum 4.8.5). Download links: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/ Installation instructions:

https://gcc.gnu.org/install/

Compilation Issues and Fixes

1.1 -m64 Compilation Option

Problem: gcc: error: unrecognized command line option ‘-m64’ – the -m64 flag targets X86‑64 and is unsupported on ARM64.

Solution: Use the ARM64‑specific ABI flag -mabi=lp64 instead.

1.2 Signed‑char Mismatch

Problem: Warning about comparisons always false due to data‑type range; on X86 char defaults to signed, while on ARM64 it defaults to unsigned.

Solution: Add the compiler flag -fsigned-char to force signed char on ARM64.

2.1 Assembly Instruction Rewrite

Problem: Inline assembly written for X86 cannot be assembled on ARM64.

Solution: Rewrite the assembly sections using ARM64 intrinsics or GCC built‑ins. Example images illustrate the X86 version and the corresponding ARM64 implementation.

2.2 CRC32 Instruction Replacement

Problem: X86 uses crc32b / crc32q which are unknown on ARM64.

Solution: Replace with ARM64 equivalents crc32cb, crc32ch, crc32cw, crc32cx and compile with -mcpu=generic+crc. The table image shows the mapping.

2.3 BSWAP Replacement

Problem: X86 bswap instruction is not recognized on ARM64.

Solution: Use the ARM64 rev instruction; the before/after images demonstrate the change.

2.4 REP Instruction Replacement

Problem: X86 rep prefix is unsupported on ARM64.

Solution: Replace with ARM64 rept prefix; code snippets before and after the change are shown.

2.5 SSE/SSE2 Intrinsics Porting

Some modules rely on GCC‑provided SSE/SSE2 functions that lack ARM64 equivalents. Use the open‑source https://github.com/open-estuary/sse2neon.git project to obtain NEON‑based replacements. Steps:

Copy SSE2NEON.h into the project.

Remove existing SSE‑specific code sections (see image).

Include SSE2NEON.h in source files.

2.6 Weak Memory Ordering

ARM64 employs a weak memory model, which can cause unexpected results in lock‑free code. Ensure that atomic operations are aligned to their natural size and that appropriate memory‑barrier instructions are used. The diagrams illustrate cache line sharing and out‑of‑order execution effects.

2.7 Atomic Operations on Misaligned Structures

Problem: Atomic instructions like ldaxr / stlxr require naturally aligned addresses; misaligned structs cause crashes.

Solution: Search for #pragma pack usages, remove forced byte‑packing on structs that participate in atomic operations, and ensure proper alignment.

2.8 Hard‑Coded CPU Core Counts

Code that hard‑codes the number of CPU cores for thread affinity may under‑utilize TaiShan servers. Search for sched_setaffinity calls and replace static core numbers with sysconf(_SC_NPROCESSORS_CONF) to obtain the actual core count.

2.9 Floating‑Point to Integer Conversion Differences

Conversion of double to integer types behaves differently on X86 and TaiShan ARM64, especially for overflow/underflow cases. The X86 implementation may produce an “indefinite integer value,” while ARM64 clamps to the min/max representable value. Tables illustrate the conversion results for long, unsigned long, int, and unsigned int. Adjust conversion code accordingly.

Compiler Optimizations and Architecture‑Specific Flags

4.1 Floating‑Point Precision

GCC -O2 and above enable fused multiply‑add ( fmadd) which can change rounding behavior, causing up to 16‑digit differences between X86 and ARM64 results. Disable this with -ffp-contract=off.

4.2 Targeting Kunpeng Architecture

Add -march=armv8-a to generate code tuned for the Kunpeng processor family.

4.3 Pipeline Tuning

For GCC 9.1+, enable the TSV110 pipeline with -mtune=tsv110 to fully exploit Kunpeng’s instruction‑level parallelism.

code migration GCC aarch64 memory ordering Compiler Flags floating point precision assembly rewrite

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Programming Language Overview

Preparation

Compilation Issues and Fixes

1.1 -m64 Compilation Option

1.2 Signed‑char Mismatch

2.1 Assembly Instruction Rewrite

2.2 CRC32 Instruction Replacement

2.3 BSWAP Replacement

2.4 REP Instruction Replacement

2.5 SSE/SSE2 Intrinsics Porting

2.6 Weak Memory Ordering

2.7 Atomic Operations on Misaligned Structures

2.8 Hard‑Coded CPU Core Counts

2.9 Floating‑Point to Integer Conversion Differences

Compiler Optimizations and Architecture‑Specific Flags

4.1 Floating‑Point Precision

4.2 Targeting Kunpeng Architecture

4.3 Pipeline Tuning

Architects' Tech Alliance

How this landed with the community

Was this worth your time?

0 Comments

1.1 -m64 Compilation Option

1.2 Signed‑char Mismatch

2.1 Assembly Instruction Rewrite

2.2 CRC32 Instruction Replacement

2.3 BSWAP Replacement

2.4 REP Instruction Replacement

2.5 SSE/SSE2 Intrinsics Porting

2.6 Weak Memory Ordering

2.7 Atomic Operations on Misaligned Structures

2.8 Hard‑Coded CPU Core Counts

2.9 Floating‑Point to Integer Conversion Differences

4.1 Floating‑Point Precision

4.2 Targeting Kunpeng Architecture

4.3 Pipeline Tuning