Fundamentals 10 min read

Why Adding Zero Can Slow Down Your C++ Loop 7×: The Hidden Cost of Denormal Floats

An intriguing experiment shows that replacing a harmless "y += 0" with "y += 0.1f" makes a tight C++ loop run seven times faster, revealing how denormalized floating‑point numbers trigger costly CPU paths and how understanding IEEE‑754 representation can restore performance.

Programmer DD

Sep 22, 2019

Why Adding Zero Can Slow Down Your C++ Loop 7×: The Hidden Cost of Denormal Floats

Interesting Experiment

The article starts with a puzzling experiment originally found on Stack Overflow, where two seemingly identical C++ programs differ only by using y += 0.1f versus y += 0 inside a large loop.

#include <iostream>
#include <string>
using namespace std;

int main() {
    const float x = 1.1;
    const float z = 1.123;
    float y = x;
    for (int j = 0; j < 90000000; j++) {
        y *= x;
        y /= z;
        y += 0.1f; // version A
        y -= 0.1f;
    }
    return 0;
}

#include <iostream>
#include <string>
using namespace std;

int main() {
    const float x = 1.1;
    const float z = 1.123;
    float y = x;
    for (int j = 0; j < 90000000; j++) {
        y *= x;
        y /= z;
        y += 0; // version B
        y -= 0;
    }
    return 0;
}

Both programs perform the same arithmetic, yet timing on a MacBook Pro shows version A finishes in about 1.5 seconds while version B takes roughly 10 seconds – a seven‑fold slowdown.

Review of Floating‑Point Binary Representation

Floating‑point numbers consist of three fields:

Sign (1 bit)

Exponent (8 bits, bias 127 for single precision)

Mantissa (23 bits)

For example, the decimal 5.2 is converted to binary as 101.001100110011…, then normalized to 1.0100110011 × 2². The exponent field stores 2 + 127 = 129 (10000001₂), and the leading 1 of the mantissa is implicit.

What Is a Denormalized Number?

When the exponent field is all zeros, the number is denormalized: the implicit leading bit becomes 0, allowing representation of values smaller than the smallest normalized number at the cost of reduced precision and significantly slower arithmetic on most CPUs.

Denormals have caused real‑world failures, such as the 1991 Patriot missile incident where loss of significance led to a missed intercept.

Back to the Experiment

During the loop, y quickly approaches zero. With y += 0, the intermediate results remain denormalized, so the CPU repeatedly performs the expensive denormal path, causing the slowdown. Adding a tiny non‑zero constant ( 0.1f) forces the value back into the normalized range, restoring fast execution.

Controlling Denormals

Programmers can disable denormal handling with fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);. After inserting this call, the y += 0 version runs as fast as the 0.1f version, confirming the performance impact of denormals.

#include <iostream>
#include <string>
#include <fenv.h>
using namespace std;

int main() {
    fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
    const float x = 1.1;
    const float z = 1.123;
    float y = x;
    for (int j = 0; j < 90000000; j++) {
        y *= x;
        y /= z;
        y += 0;
        y -= 0;
    }
    return 0;
}

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization C CPU floating-point denormal

Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.