Fundamentals 10 min read

Mastering Floating‑Point Computation on Resource‑Constrained MCUs

This article explains how microcontroller units (MCUs) handle floating‑point operations, covering IEEE‑754 representation, hardware versus software FPU approaches, performance and precision challenges, and a range of optimization techniques—from hardware selection and fixed‑point tricks to compiler flags and system‑level power management.

Liangxu Linux

Aug 28, 2025

Mastering Floating‑Point Computation on Resource‑Constrained MCUs

Basic Principles of MCU Floating‑Point Computation

Floating‑point numbers follow the IEEE‑754 standard and consist of a sign bit, an exponent (8 bits for single precision, 11 bits for double precision) and a mantissa (23 bits for single, 52 bits for double).

1. Floating‑Point Representation

Sign bit (1 bit): indicates positive or negative.

Exponent (8 bits single / 11 bits double).

Mantissa (23 bits single / 52 bits double).

2. Hardware vs. Software Floating‑Point

MCUs can perform floating‑point arithmetic in two ways:

Hardware Floating‑Point Unit (FPU) : Dedicated circuitry that executes floating‑point instructions (e.g., VADD.F32, VMUL.F64). It offers high performance and low power but requires a chip with an integrated FPU such as Cortex‑M4F, M7, or M33.

Software Floating‑Point Library : Implements floating‑point operations in software, useful for MCUs without an FPU (e.g., Cortex‑M0/M3). It provides flexibility but incurs higher latency and larger code size.

Challenges of MCU Floating‑Point Computation

1. Performance Bottlenecks

Software emulation can be 10–100× slower than hardware.

Complex functions (sin, cos, exp) may take thousands of cycles.

Memory accesses, especially for double‑precision, become a limiting factor.

2. Precision Issues

Single precision provides only about 7 decimal digits.

Accumulated rounding errors can grow in iterative calculations.

Equality comparisons must be performed with tolerance.

3. Resource Consumption

Software floating‑point increases program flash usage.

Additional RAM is needed for intermediate results.

Longer execution times may affect real‑time interrupt latency.

4. Power Considerations

Activating the FPU raises power draw.

Frequent floating‑point operations can shorten battery life.

Effective power‑management strategies are required.

Optimization Strategies

1. Hardware Selection

Choose MCUs with an integrated FPU (e.g., STM32F4/F7/H7 series for single‑precision, STM32H7 for double‑precision).

Leverage DSP extensions such as SIMD instructions on Cortex‑M4/M7.

Consider external co‑processors for heavy mathematical workloads.

2. Algorithm‑Level Optimizations

Fixed‑point substitution : Use Q‑format numbers when the dynamic range is known.

// Q15 example: 1 sign bit + 15 fractional bits
int16_t q15_a = 0.5 * 32768; // 0.5 → 16384
int16_t q15_b = 0.25 * 32768; // 0.25 → 8192
int16_t q15_result = (q15_a * q15_b) >> 15; // result = 0.125

Lookup tables : Pre‑compute common function values.

const float sin_table[360] = {0, 0.017452, ...};
float fast_sin(uint16_t degree) {
    return sin_table[degree % 360];
}

Approximation algorithms : Use Taylor series or polynomial fits for functions like sqrt or inverse square root.

// Fast inverse square root (Quake III variant)
float fast_inv_sqrt(float x) {
    float xhalf = 0.5f * x;
    int i = *(int*)&x;
    i = 0x5f3759df - (i >> 1);
    x = *(float*)&i;
    x = x * (1.5f - (xhalf * x * x));
    return x;
}

3. Code‑Level Optimizations

Enable compiler optimizations : Use flags like -O3 and, with caution, -ffast-math.

Inline functions to reduce call overhead.

Vectorization : Exploit SIMD instructions (e.g., VADD.F32, VMLA.F32).

Avoid unnecessary type casts between integer and floating‑point.

__attribute__((always_inline)) inline float cubic(float x) {
    return x * x * x;
}

4. System‑Level Optimizations

Dynamic FPU enable : Turn on the FPU only when needed.

void enable_fpu(void) {
    SCB->CPACR |= ((3UL << 10*2) | (3UL << 11*2)); // enable CP10, CP11
    __DSB();
    __ISB();
}

Batch processing : Group floating‑point operations to reduce state switches.

Use DMA for data movement to keep the CPU free.

Application Cases

1. Industrial PID Controller (Fixed‑Point)

// Q15‑based PID implementation
typedef struct {
    int16_t Kp, Ki, Kd;
    int16_t integral_max;
    int32_t integral;
    int16_t prev_error;
} PID_Controller;

int16_t PID_Update(PID_Controller* pid, int16_t error) {
    int32_t p_term = (int32_t)pid->Kp * error;
    pid->integral += error;
    if (pid->integral > pid->integral_max * 32768) {
        pid->integral = pid->integral_max * 32768;
    } else if (pid->integral < -pid->integral_max * 32768) {
        pid->integral = -pid->integral_max * 32768;
    }
    int32_t i_term = (int32_t)pid->Ki * pid->integral;
    int16_t deriv = error - pid->prev_error;
    pid->prev_error = error;
    int32_t d_term = (int32_t)pid->Kd * deriv;
    int32_t output = (p_term + i_term + d_term) >> 15;
    return (int16_t)(output > 32767 ? 32767 : (output < -32768 ? -32768 : output));
}

2. Sensor Data Processing with Hardware FPU

// Calibration routine using an enabled FPU
void calibrate_sensor(float* readings, uint32_t count, float* offset, float* scale) {
    __enable_fpu(); // ensure FPU is active
    float sum = 0.0f, sum_sq = 0.0f;
    float min_val = readings[0], max_val = readings[0];
    for (uint32_t i = 0; i < count; i++) {
        sum += readings[i];
        sum_sq += readings[i] * readings[i];
        min_val = fminf(min_val, readings[i]);
        max_val = fmaxf(max_val, readings[i]);
    }
    float mean = sum / count;
    float std_dev = sqrtf((sum_sq - sum*sum/count) / (count-1));
    *offset = mean;
    *scale = 1.0f / (max_val - min_val);
    __disable_fpu(); // power‑save after use
}

Conclusion

Floating‑point computation on MCUs requires a careful trade‑off among performance, precision, memory usage, and power consumption. By selecting appropriate hardware, applying algorithmic shortcuts, and writing highly optimized code, developers can achieve satisfactory floating‑point performance even on resource‑limited embedded platforms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

optimization ARM embedded MCU floating-point Fixed-Point FPU

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.