Mastering Floating‑Point Computation on Resource‑Constrained MCUs
This article explains how microcontroller units (MCUs) handle floating‑point operations, covering IEEE‑754 representation, hardware versus software FPU approaches, performance and precision challenges, and a range of optimization techniques—from hardware selection and fixed‑point tricks to compiler flags and system‑level power management.
Basic Principles of MCU Floating‑Point Computation
Floating‑point numbers follow the IEEE‑754 standard and consist of a sign bit, an exponent (8 bits for single precision, 11 bits for double precision) and a mantissa (23 bits for single, 52 bits for double).
1. Floating‑Point Representation
Sign bit (1 bit): indicates positive or negative.
Exponent (8 bits single / 11 bits double).
Mantissa (23 bits single / 52 bits double).
2. Hardware vs. Software Floating‑Point
MCUs can perform floating‑point arithmetic in two ways:
Hardware Floating‑Point Unit (FPU) : Dedicated circuitry that executes floating‑point instructions (e.g., VADD.F32, VMUL.F64). It offers high performance and low power but requires a chip with an integrated FPU such as Cortex‑M4F, M7, or M33.
Software Floating‑Point Library : Implements floating‑point operations in software, useful for MCUs without an FPU (e.g., Cortex‑M0/M3). It provides flexibility but incurs higher latency and larger code size.
Challenges of MCU Floating‑Point Computation
1. Performance Bottlenecks
Software emulation can be 10–100× slower than hardware.
Complex functions (sin, cos, exp) may take thousands of cycles.
Memory accesses, especially for double‑precision, become a limiting factor.
2. Precision Issues
Single precision provides only about 7 decimal digits.
Accumulated rounding errors can grow in iterative calculations.
Equality comparisons must be performed with tolerance.
3. Resource Consumption
Software floating‑point increases program flash usage.
Additional RAM is needed for intermediate results.
Longer execution times may affect real‑time interrupt latency.
4. Power Considerations
Activating the FPU raises power draw.
Frequent floating‑point operations can shorten battery life.
Effective power‑management strategies are required.
Optimization Strategies
1. Hardware Selection
Choose MCUs with an integrated FPU (e.g., STM32F4/F7/H7 series for single‑precision, STM32H7 for double‑precision).
Leverage DSP extensions such as SIMD instructions on Cortex‑M4/M7.
Consider external co‑processors for heavy mathematical workloads.
2. Algorithm‑Level Optimizations
Fixed‑point substitution : Use Q‑format numbers when the dynamic range is known.
// Q15 example: 1 sign bit + 15 fractional bits
int16_t q15_a = 0.5 * 32768; // 0.5 → 16384
int16_t q15_b = 0.25 * 32768; // 0.25 → 8192
int16_t q15_result = (q15_a * q15_b) >> 15; // result = 0.125Lookup tables : Pre‑compute common function values.
const float sin_table[360] = {0, 0.017452, ...};
float fast_sin(uint16_t degree) {
return sin_table[degree % 360];
}Approximation algorithms : Use Taylor series or polynomial fits for functions like sqrt or inverse square root.
// Fast inverse square root (Quake III variant)
float fast_inv_sqrt(float x) {
float xhalf = 0.5f * x;
int i = *(int*)&x;
i = 0x5f3759df - (i >> 1);
x = *(float*)&i;
x = x * (1.5f - (xhalf * x * x));
return x;
}3. Code‑Level Optimizations
Enable compiler optimizations : Use flags like -O3 and, with caution, -ffast-math.
Inline functions to reduce call overhead.
Vectorization : Exploit SIMD instructions (e.g., VADD.F32, VMLA.F32).
Avoid unnecessary type casts between integer and floating‑point.
__attribute__((always_inline)) inline float cubic(float x) {
return x * x * x;
}4. System‑Level Optimizations
Dynamic FPU enable : Turn on the FPU only when needed.
void enable_fpu(void) {
SCB->CPACR |= ((3UL << 10*2) | (3UL << 11*2)); // enable CP10, CP11
__DSB();
__ISB();
}Batch processing : Group floating‑point operations to reduce state switches.
Use DMA for data movement to keep the CPU free.
Application Cases
1. Industrial PID Controller (Fixed‑Point)
// Q15‑based PID implementation
typedef struct {
int16_t Kp, Ki, Kd;
int16_t integral_max;
int32_t integral;
int16_t prev_error;
} PID_Controller;
int16_t PID_Update(PID_Controller* pid, int16_t error) {
int32_t p_term = (int32_t)pid->Kp * error;
pid->integral += error;
if (pid->integral > pid->integral_max * 32768) {
pid->integral = pid->integral_max * 32768;
} else if (pid->integral < -pid->integral_max * 32768) {
pid->integral = -pid->integral_max * 32768;
}
int32_t i_term = (int32_t)pid->Ki * pid->integral;
int16_t deriv = error - pid->prev_error;
pid->prev_error = error;
int32_t d_term = (int32_t)pid->Kd * deriv;
int32_t output = (p_term + i_term + d_term) >> 15;
return (int16_t)(output > 32767 ? 32767 : (output < -32768 ? -32768 : output));
}2. Sensor Data Processing with Hardware FPU
// Calibration routine using an enabled FPU
void calibrate_sensor(float* readings, uint32_t count, float* offset, float* scale) {
__enable_fpu(); // ensure FPU is active
float sum = 0.0f, sum_sq = 0.0f;
float min_val = readings[0], max_val = readings[0];
for (uint32_t i = 0; i < count; i++) {
sum += readings[i];
sum_sq += readings[i] * readings[i];
min_val = fminf(min_val, readings[i]);
max_val = fmaxf(max_val, readings[i]);
}
float mean = sum / count;
float std_dev = sqrtf((sum_sq - sum*sum/count) / (count-1));
*offset = mean;
*scale = 1.0f / (max_val - min_val);
__disable_fpu(); // power‑save after use
}Conclusion
Floating‑point computation on MCUs requires a careful trade‑off among performance, precision, memory usage, and power consumption. By selecting appropriate hardware, applying algorithmic shortcuts, and writing highly optimized code, developers can achieve satisfactory floating‑point performance even on resource‑limited embedded platforms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
