Fundamentals 24 min read

Why Clean Code Can Slow Your C++ Programs by 15× – A Performance Deep Dive

An extensive performance analysis shows that strictly following clean‑code rules—such as using polymorphism, avoiding switch statements, and keeping functions tiny—can dramatically increase execution time, with measured slowdowns up to fifteen times, while alternative table‑driven or flat‑structure approaches achieve orders‑of‑magnitude speed gains.

Programmer DD
Programmer DD
Programmer DD
Why Clean Code Can Slow Your C++ Programs by 15× – A Performance Deep Dive

Performance Impact of Clean‑Code Rules

Writing “clean” code is a repeatedly‑cited programming recommendation, especially for beginners, but many of its rules do not affect runtime. Some rules, however, can be objectively measured because they influence execution behavior.

When we extract the rules that actually affect code structure, we obtain the following list:

Use polymorphism instead of if/else and switch;

Code should not know the internal structure of the objects it uses;

Strictly control function size;

Each function should do only one thing;

DRY (Don’t Repeat Yourself).

These rules dictate how we should write specific code fragments. The key question is: if we create code that follows these rules, how does its performance compare?

Example: Shape Hierarchy Using Polymorphism

/*========================================================================
   LISTING 22
========================================================================*/
class shape_base {
public:
    shape_base() {}
    virtual f32 Area() = 0;
};

class square : public shape_base {
public:
    square(f32 SideInit) : Side(SideInit) {}
    virtual f32 Area() { return Side * Side; }
private:
    f32 Side;
};

class rectangle : public shape_base {
public:
    rectangle(f32 WidthInit, f32 HeightInit) : Width(WidthInit), Height(HeightInit) {}
    virtual f32 Area() { return Width * Height; }
private:
    f32 Width, Height;
};

class triangle : public shape_base {
public:
    triangle(f32 BaseInit, f32 HeightInit) : Base(BaseInit), Height(HeightInit) {}
    virtual f32 Area() { return 0.5f * Base * Height; }
private:
    f32 Base, Height;
};

class circle : public shape_base {
public:
    circle(f32 RadiusInit) : Radius(RadiusInit) {}
    virtual f32 Area() { return Pi32 * Radius * Radius; }
private:
    f32 Radius;
};

The hierarchy follows the clean‑code rules: polymorphism, small functions, and each class does one thing.

Summing Areas with a Virtual Call Loop

/*========================================================================
   LISTING 23
========================================================================*/

f32 TotalAreaVTBL(u32 ShapeCount, shape_base **Shapes) {
    f32 Accum = 0.0f;
    for (u32 ShapeIndex = 0; ShapeIndex < ShapeCount; ++ShapeIndex) {
        Accum += Shapes[ShapeIndex]->Area();
    }
    return Accum;
}

We deliberately avoid iterators because the clean‑code guide discourages them.

Manual Loop Unrolling

/*========================================================================
   LISTING 24
========================================================================*/

f32 TotalAreaVTBL4(u32 ShapeCount, shape_base **Shapes) {
    f32 Accum0 = 0.0f;
    f32 Accum1 = 0.0f;
    f32 Accum2 = 0.0f;
    f32 Accum3 = 0.0f;
    u32 Count = ShapeCount / 4;
    while (Count--) {
        Accum0 += Shapes[0]->Area();
        Accum1 += Shapes[1]->Area();
        Accum2 += Shapes[2]->Area();
        Accum3 += Shapes[3]->Area();
        Shapes += 4;
    }
    return (Accum0 + Accum1 + Accum2 + Accum3);
}

Running both versions in a simple test shows roughly the same number of loop iterations per shape (about 35, occasionally 34).

Performance comparison chart
Performance comparison chart

The first measurement runs the code once (cold cache), the second repeats it many times (warm cache). The difference is modest, confirming that the clean‑code version needs about 35 loop cycles per shape.

Violating Rule 1: Using a Switch Instead of Polymorphism

If we replace the class hierarchy with a flat enum and a switch statement, we obtain the following code:

/*========================================================================
   LISTING 25
========================================================================*/
enum shape_type : u32 {
    Shape_Square,
    Shape_Rectangle,
    Shape_Triangle,
    Shape_Circle,
    Shape_Count,
};

struct shape_union {
    shape_type Type;
    f32 Width;
    f32 Height;
};

f32 GetAreaSwitch(shape_union Shape) {
    f32 Result = 0.0f;
    switch (Shape.Type) {
        case Shape_Square:   { Result = Shape.Width * Shape.Width; } break;
        case Shape_Rectangle:{ Result = Shape.Width * Shape.Height; } break;
        case Shape_Triangle: { Result = 0.5f * Shape.Width * Shape.Height; } break;
        case Shape_Circle:   { Result = Pi32 * Shape.Width * Shape.Width; } break;
        case Shape_Count:    {} break;
    }
    return Result;
}

The corresponding summation loops are almost identical to the virtual‑call versions:

/*========================================================================
   LISTING 26
========================================================================*/

f32 TotalAreaSwitch(u32 ShapeCount, shape_union *Shapes) {
    f32 Accum = 0.0f;
    for (u32 ShapeIndex = 0; ShapeIndex < ShapeCount; ++ShapeIndex) {
        Accum += GetAreaSwitch(Shapes[ShapeIndex]);
    }
    return Accum;
}

Running the test shows that the switch‑based version is about 1.5× faster than the polymorphic version.

Switch vs virtual performance chart
Switch vs virtual performance chart

Table‑Driven Approach

By moving the coefficient table into the data model we can eliminate the switch entirely:

/*========================================================================
   LISTING 27
========================================================================*/

f32 const CTable[Shape_Count] = { 1.0f, 1.0f, 0.5f, Pi32 };

f32 GetAreaUnion(shape_union Shape) {
    return CTable[Shape.Type] * Shape.Width * Shape.Height;
}

Replacing GetAreaSwitch with GetAreaUnion in the summation loops yields a speedup of roughly 10× over the polymorphic version and about 2× over the plain switch version.

Table‑driven performance chart
Table‑driven performance chart

Adding a Second Property (Corner Count)

We extend the hierarchy with a virtual CornerCount() method and repeat the experiments. The switch‑based version now calls an extra function, while the table‑driven version stores both the area coefficient and the corner factor in a single table:

/*========================================================================
   LISTING 32
========================================================================*/
class shape_base {
public:
    shape_base() {}
    virtual f32 Area() = 0;
    virtual u32 CornerCount() = 0;
};
/* ... square, rectangle, triangle, circle definitions with CornerCount overridden ... */
/*========================================================================
   LISTING 34
========================================================================*/

u32 GetCornerCountSwitch(shape_type Type) {
    switch (Type) {
        case Shape_Square:   return 4; break;
        case Shape_Rectangle:return 4; break;
        case Shape_Triangle: return 3; break;
        case Shape_Circle:   return 0; break;
        case Shape_Count:    return 0; break;
    }
    return 0;
}
/*========================================================================
   LISTING 35
========================================================================*/

f32 CornerAreaSwitch(u32 ShapeCount, shape_union *Shapes) {
    f32 Accum = 0.0f;
    for (u32 i = 0; i < ShapeCount; ++i) {
        Accum += (1.0f / (1.0f + (f32)GetCornerCountSwitch(Shapes[i].Type)))
                  * GetAreaSwitch(Shapes[i]));
    }
    return Accum;
}

The table‑driven version stores the combined factor directly:

/*========================================================================
   LISTING 36
========================================================================*/

f32 const CTable[Shape_Count] = {
    1.0f / (1.0f + 4.0f),
    1.0f / (1.0f + 4.0f),
    0.5f / (1.0f + 3.0f),
    Pi32
};

f32 GetCornerAreaUnion(shape_union Shape) {
    return CTable[Shape.Type] * Shape.Width * Shape.Height;
}

Measurements show that the clean‑code version is again the slowest, while the table‑driven version can be up to 15× faster.

Corner‑area performance chart
Corner‑area performance chart

Conclusions

The experiments demonstrate that adhering strictly to several clean‑code rules—especially avoiding switch, keeping functions tiny, and using polymorphism—can degrade performance dramatically, sometimes by an order of magnitude or more. While clean code may improve readability, developers must weigh these benefits against the potential hardware‑level cost, especially in performance‑critical domains such as game engines or high‑throughput services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimizationclean codePolymorphismC++
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.