Debugging Floating‑Point Precision Differences in Query‑Diff Tests: A CPU Instruction Set Case Study (AVX vs SSE)
This article details a query‑diff test that revealed a 1% floating‑point precision difference after a module upgrade, describes the systematic investigation of environment, compilation, and CPU instruction‑set variations (AVX vs SSE), and presents conclusions and recommendations for preventing similar issues.
In Baidu's Quality Assurance team, a query‑diff test on a C++ module (module A) exposed a precision difference of about 1% in the single‑precision float output Q after an upgrade, despite expectations of no change.
The investigation began by checking the environment: configuration files and word lists matched, and forward‑compatible deployment confirmed no configuration differences. Subsequent program checks ruled out random strategies, thread or process cache contamination, and type‑conversion errors.
Further analysis considered the compilation environment. Both old and new binaries were built on the company’s cloud compilation cluster, and local recompilation on identical machines reproduced the diff, eliminating compilation parameters as the cause.
Hardware comparison revealed identical CentOS 4.3 operating systems on the test machines, but different CPUs: the new environment ran on an Intel Xeon E5645 while the old used an Xeon E5‑2620. Deploying the new version on a machine with the older CPU eliminated the diff, pinpointing the CPU as the source.
CPU instruction‑set differences were examined. The SSE and AVX instruction sets, especially AVX2’s FMA instructions, provide higher precision for floating‑point operations. Intel’s MKL library, used indirectly by the module, employs AVX optimizations on CPUs that support it, while falling back to SSE on older CPUs.
Because the floating‑point unit (FPU) operates at 80‑bit precision internally, but SSE/AVX output is truncated to 32‑bit, the slight precision variance (approximately 1 bit) between AVX and SSE accumulates through the module’s matrix calculations, resulting in the observed diff at the ten‑thousandth decimal place.
The case concludes that the precision difference stems from CPU‑dependent instruction‑set optimizations (AVX vs SSE). Recommendations include ensuring identical CPU models for test environments, adding hardware checks before query‑diff runs, confirming AVX support on production machines, and auditing other modules for similar instruction‑set usage.
Additional insights highlight that SIMD optimizations (SSE provides ~4× speedup, AVX ~8×) can greatly improve performance, but developers must manage iteration counts to avoid cumulative precision errors, and that query‑diff testing can be extended to assess broader compatibility impacts of hardware and system libraries.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
