Why Adding Zero Can Slow Down Your C++ Loop 7×: The Hidden Cost of Denormal Floats
An intriguing experiment shows that replacing a harmless "y += 0" with "y += 0.1f" makes a tight C++ loop run seven times faster, revealing how denormalized floating‑point numbers trigger costly CPU paths and how understanding IEEE‑754 representation can restore performance.
Interesting Experiment
The article starts with a puzzling experiment originally found on Stack Overflow, where two seemingly identical C++ programs differ only by using y += 0.1f versus y += 0 inside a large loop.
#include <iostream>
#include <string>
using namespace std;
int main() {
const float x = 1.1;
const float z = 1.123;
float y = x;
for (int j = 0; j < 90000000; j++) {
y *= x;
y /= z;
y += 0.1f; // version A
y -= 0.1f;
}
return 0;
} #include <iostream>
#include <string>
using namespace std;
int main() {
const float x = 1.1;
const float z = 1.123;
float y = x;
for (int j = 0; j < 90000000; j++) {
y *= x;
y /= z;
y += 0; // version B
y -= 0;
}
return 0;
}Both programs perform the same arithmetic, yet timing on a MacBook Pro shows version A finishes in about 1.5 seconds while version B takes roughly 10 seconds – a seven‑fold slowdown.
Review of Floating‑Point Binary Representation
Floating‑point numbers consist of three fields:
Sign (1 bit)
Exponent (8 bits, bias 127 for single precision)
Mantissa (23 bits)
For example, the decimal 5.2 is converted to binary as 101.001100110011…, then normalized to 1.0100110011 × 2². The exponent field stores 2 + 127 = 129 (10000001₂), and the leading 1 of the mantissa is implicit.
What Is a Denormalized Number?
When the exponent field is all zeros, the number is denormalized: the implicit leading bit becomes 0, allowing representation of values smaller than the smallest normalized number at the cost of reduced precision and significantly slower arithmetic on most CPUs.
Denormals have caused real‑world failures, such as the 1991 Patriot missile incident where loss of significance led to a missed intercept.
Back to the Experiment
During the loop, y quickly approaches zero. With y += 0, the intermediate results remain denormalized, so the CPU repeatedly performs the expensive denormal path, causing the slowdown. Adding a tiny non‑zero constant ( 0.1f) forces the value back into the normalized range, restoring fast execution.
Controlling Denormals
Programmers can disable denormal handling with fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);. After inserting this call, the y += 0 version runs as fast as the 0.1f version, confirming the performance impact of denormals.
#include <iostream>
#include <string>
#include <fenv.h>
using namespace std;
int main() {
fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
const float x = 1.1;
const float z = 1.123;
float y = x;
for (int j = 0; j < 90000000; j++) {
y *= x;
y /= z;
y += 0;
y -= 0;
}
return 0;
}Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
