Fundamentals 10 min read

Why Adding Zero Can Slow Your Code 7×: The Hidden Cost of Denormalized Floats

An intriguing experiment shows that a seemingly harmless "y += 0" operation can be seven times slower than "y += 0.1f" due to CPU handling of denormalized floating‑point numbers, and the article explains the binary representation, performance impact, and how to disable it for faster code.

Java Backend Technology

Sep 29, 2019

Why Adding Zero Can Slow Your Code 7×: The Hidden Cost of Denormalized Floats

Interesting Experiment

The article starts with a puzzling StackOverflow example where two seemingly identical C++ loops differ only by y += 0.1f versus y += 0. Despite identical logic, the latter runs about seven times slower.

#include <iostream>
using namespace std;
int main() {
    const float x = 1.1f;
    const float z = 1.123f;
    float y = x;
    for (int j = 0; j < 90000000; ++j) {
        y *= x;
        y /= z;
        y += 0.1f; // or y += 0;
        y -= 0.1f; // or y -= 0;
    }
    return 0;
}

Running the two binaries on a MacBook Pro (any SSE2‑capable CPU) yields:

real    0m1.490s  (with 0.1f)
real    0m9.895s  (with 0)

The author then provides a detailed analysis of floating‑point representation.

Review of Binary Floating‑Point

Floating‑point numbers consist of three parts: Sign (1 bit), Exponent (8 bits, bias 127), and Mantissa (23 bits). The article walks through converting a decimal number (e.g., 5.2) to binary, normalizing it to the form 1.xxx × 2^e, and filling the three fields.

When the exponent would be less than the minimum normal exponent, the number becomes a denormalized (subnormal) number , where the leading bit before the binary point is allowed to be 0. This extends the range of representable values but incurs a large performance penalty because most CPUs handle subnormals in microcode.

Back to the Experiment

In the loop, y += 0 forces the intermediate result to become a subnormal value after many iterations, causing the CPU to perform costly subnormal arithmetic for the rest of the loop. Adding a tiny non‑zero constant ( 0.1f) keeps the value in the normal range, so the loop stays fast.

Solution

The article shows that disabling subnormal handling with fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV); restores the performance of the y += 0 version to be comparable to the 0.1f version.

#include <fenv.h>
using namespace std;
int main() {
    fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);
    const float x = 1.1f;
    const float z = 1.123f;
    float y = x;
    for (int j = 0; j < 90000000; ++j) {
        y *= x;
        y /= z;
        y += 0;
        y -= 0;
    }
    return 0;
}

Thus, the experiment demonstrates how denormalized floating‑point numbers can dramatically degrade performance and how to mitigate the issue.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

optimization CPU performance C++floating-point denormal numbers numerical accuracy

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.