Why Adding Zero Can Slow Your Code 7×: The Hidden Cost of Denormalized Floats
An intriguing experiment shows that a seemingly harmless "y += 0" operation can be seven times slower than "y += 0.1f" due to CPU handling of denormalized floating‑point numbers, and the article explains the binary representation, performance impact, and how to disable it for faster code.
Interesting Experiment
The article starts with a puzzling StackOverflow example where two seemingly identical C++ loops differ only by y += 0.1f versus y += 0. Despite identical logic, the latter runs about seven times slower.
#include <iostream>
using namespace std;
int main() {
const float x = 1.1f;
const float z = 1.123f;
float y = x;
for (int j = 0; j < 90000000; ++j) {
y *= x;
y /= z;
y += 0.1f; // or y += 0;
y -= 0.1f; // or y -= 0;
}
return 0;
}Running the two binaries on a MacBook Pro (any SSE2‑capable CPU) yields:
real 0m1.490s (with 0.1f)
real 0m9.895s (with 0)The author then provides a detailed analysis of floating‑point representation.
Review of Binary Floating‑Point
Floating‑point numbers consist of three parts: Sign (1 bit), Exponent (8 bits, bias 127), and Mantissa (23 bits). The article walks through converting a decimal number (e.g., 5.2) to binary, normalizing it to the form 1.xxx × 2^e, and filling the three fields.
When the exponent would be less than the minimum normal exponent, the number becomes a denormalized (subnormal) number , where the leading bit before the binary point is allowed to be 0. This extends the range of representable values but incurs a large performance penalty because most CPUs handle subnormals in microcode.
Back to the Experiment
In the loop, y += 0 forces the intermediate result to become a subnormal value after many iterations, causing the CPU to perform costly subnormal arithmetic for the rest of the loop. Adding a tiny non‑zero constant ( 0.1f) keeps the value in the normal range, so the loop stays fast.
Solution
The article shows that disabling subnormal handling with fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV); restores the performance of the y += 0 version to be comparable to the 0.1f version.
#include <fenv.h>
using namespace std;
int main() {
fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
const float x = 1.1f;
const float z = 1.123f;
float y = x;
for (int j = 0; j < 90000000; ++j) {
y *= x;
y /= z;
y += 0;
y -= 0;
}
return 0;
}Thus, the experiment demonstrates how denormalized floating‑point numbers can dramatically degrade performance and how to mitigate the issue.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java Backend Technology
Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
