Backend Development 14 min read

Code Migration Experience: Porting C/C++ Applications from x86 to TaiShan aarch64 Servers

This article presents a comprehensive guide on migrating business code from x86 to TaiShan aarch64 servers, covering language differences, compilation toolchains, architecture‑specific issues such as compiler options, assembly rewrites, memory ordering, floating‑point behavior, and recommended GCC optimizations.

Architects' Tech Alliance

Jan 3, 2021

Code Migration Experience: Porting C/C++ Applications from x86 to TaiShan aarch64 Servers

The author shares the experience of migrating business code from x86 servers to TaiShan aarch64 servers after completing two projects, noting that the TaiShan servers run the code more smoothly and efficiently.

Programming Language Overview

High‑level languages can be divided into compiled languages (e.g., C, C++) and interpreted languages (e.g., Java, Python). Compiled languages generate machine‑code that is tied to the CPU architecture, so binaries built for x86 cannot run directly on TaiShan without recompilation. Interpreted languages produce platform‑independent bytecode executed by a virtual machine, allowing the same binaries to run on both x86 and TaiShan.

Preparation Work

To port C/C++ programs, install a GCC compiler version 7.3 or newer (minimum 4.8.5). Download links: http://ftp.gnu.org/gnu/gcc/gcc-7.3.0/ ; installation instructions: https://gcc.gnu.org/install/ .

Migration Issues and Solutions

1.1 -m64 Compilation Option

Symptom: gcc error: unrecognized command line option ‘-m64’.

Cause: -m64 is an x86‑64 option; ARM64 does not support it.

Solution: Use the ARM64‑specific option -mabi=lp64.

1.2 Char Signedness

Symptom: warning: comparison is always false due to limited range of data type.

Cause: On x86, char defaults to signed; on ARM64 it defaults to unsigned.

Solution: Add the compiler flag -fsigned-char to force signed char on ARM64.

2.1 Assembly Rewrite

Symptom: ARM assembly differs from x86; embedded assembly must be rewritten.

Solution: Re‑implement the assembly sections for ARM64.

2.2 Replace x86 CRC32 Instructions

Symptom: Compilation error: unknown mnemonic `crc32q` or unrecognized option ‘-msse4.2’.

Cause: x86 uses crc32b / crc32q; ARM64 uses crc32cb, crc32ch, crc32cw, crc32cx.

Solution: Replace x86 CRC32 instructions with the ARM64 equivalents and add the compiler flag -mcpu=generic+crc.

2.3 Replace x86 bswap Instruction

Symptom: Compilation error: unknown mnemonic `bswap`.

Cause: bswap is an x86 byte‑swap instruction; ARM64 uses rev.

Solution: Substitute bswap with rev.

2.4 Replace x86 rep Instruction

Symptom: Compilation error: unknown mnemonic `rep`.

Cause: rep is an x86 repeat prefix; ARM64 uses rept.

Solution: Replace rep with rept.

2.5 Fast Porting of Inline SSE/SSE2 Code

Some applications use GCC‑wrapped SSE/SSE2 functions that lack ARM64 equivalents. Use the open‑source project https://github.com/open-estuary/sse2neon.git to obtain ARM64 implementations.

Steps: copy SSE2NEON.h into the project, remove the original SSE code (see image), and include the new header.

2.6 Weak Memory Ordering Issues

Symptom: Program results differ from expectations due to weak memory ordering on ARM64.

Cause: ARM64 caches multiple copies of data and may reorder execution, leading to out‑of‑order writes.

Solution: Identify lock‑free code and insert appropriate memory‑barrier instructions to enforce ordering.

2.7 Atomic Operations on Struct Members Causing Core Dumps

Symptom: Core dump when performing atomic operations on struct members.

Cause: ARM64 atomic instructions ( ldaxr, stlxr) require naturally aligned addresses; forced byte‑packing can break alignment.

Solution: Search for #pragma pack and ensure variables used in atomic ops are properly aligned.

2.8 Hard‑Coded CPU Core Count

Hard‑coding the number of CPU cores can prevent full utilization on TaiShan servers. Search for sched_setaffinity usage and replace static core numbers with sysconf(_SC_NPROCESSORS_CONF).

2.9 Double‑to‑Integer Conversion Differences

Floating‑point to integer conversion behaves differently on TaiShan versus x86, especially for overflow cases. On ARM64, conversions clamp to the representable range instead of producing an “indefinite” value.

Refer to the provided conversion tables for correct handling of double → long, unsigned long, int, and unsigned int.

4.1 GCC Floating‑Point Optimization

Symptom: With -O2 or higher, floating‑point multiply‑add results differ after the 16th decimal place between x86 and ARM64.

Cause: GCC uses the fused‑multiply‑add instruction fmadd, which does not round the intermediate product.

Solution: Add the flag -ffp-contract=off to disable this optimization.

4.2 Targeting Kunpeng Architecture

Add the compiler flag -march=armv8-a to generate code optimized for the Kunpeng (TaiShan) processor.

4.3 Tuning for Kunpeng Pipeline

If using GCC 9.1+, add -mtune=tsv110 to exploit the TSV110 pipeline of the Kunpeng CPU.

--- End of technical guide ---

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

compiler C#Assembly Code migration aarch64 memory ordering

Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.