Architects' Tech Alliance
Architects' Tech Alliance
Jan 16, 2026 · Artificial Intelligence

Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets

Engineers training large AI models often see noticeable FP16/BF16 result differences between GPUs and NPUs, and even between generations of the same chip, due to floating‑point representation limits, hardware design choices, software library implementations, compiler optimizations, and parallel execution nondeterminism.

AIGPUNPU
0 likes · 10 min read
Why Do GPUs and NPUs Produce Different FP16 Results? Uncovering AI Chip Precision Secrets
Architects' Tech Alliance
Architects' Tech Alliance
Jan 21, 2020 · Backend Development

How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers

This guide details the migration of X86‑compiled C/C++ applications to Huawei TaiShan Aarch64 servers, covering language differences, required compiler versions, common build‑time errors, assembly rewrites, memory‑ordering quirks, floating‑point precision issues, and specific GCC flags to achieve correct and performant binaries.

Compiler FlagsGCCaarch64
0 likes · 14 min read
How to Seamlessly Migrate X86 C/C++ Code to Aarch64 TaiShan Servers
Baidu Intelligent Testing
Baidu Intelligent Testing
Dec 9, 2015 · Fundamentals

Investigation of Query‑Diff Precision Differences Caused by CPU Instruction‑Set Variations (AVX vs SSE)

A detailed case study shows how a 1% precision difference discovered by query‑diff testing was traced to CPU instruction‑set discrepancies (AVX vs SSE), highlighting the impact of hardware‑level floating‑point optimizations on algorithmic results and providing practical debugging and mitigation guidelines.

AVXCPUPerformance optimization
0 likes · 13 min read
Investigation of Query‑Diff Precision Differences Caused by CPU Instruction‑Set Variations (AVX vs SSE)