Boost C Performance: Proven Tricks to Speed Up Your Code
This article gathers practical C‑language optimization techniques—ranging from integer declarations and branch reduction to loop unrolling and lookup‑table usage—to help developers improve execution speed and reduce memory consumption on resource‑constrained devices.
Introduction
When building a lightweight JPEG library for mobile devices, I collected a set of methods that make C programs run faster while keeping memory usage low. The goal is to optimise both execution speed and memory footprint without sacrificing readability.
Declarations
On ARM platforms I used many low‑level tricks, but not every tip from the internet works everywhere. I therefore filtered the useful ones and adapted them to be portable.
Where to Apply These Methods
Identify the hot spots of your program—functions or loops that consume the most time or memory—using profiling tools such as Visual C++ Profiler or Intel VTune. Focus optimisation on these parts.
Integer Types
Prefer unsigned int for non‑negative values; many processors handle unsigned arithmetic faster than signed. Declare loop counters as register unsigned int variable_name; to hint at register usage.
Division and Modulo
A 32‑bit division can take 20–140 cycles. Replace division with multiplication when possible, e.g., if (b * c is integer) a > b * c. Use unsigned division for better performance.
Combine Division and Modulo
If both x / y and x % y are needed, compute them together:
int func_div_and_mod(int a, int b) { return (a / b) + (a % b); }Power‑of‑Two Division
When the divisor is a power of two, the compiler can replace division with a shift. Use powers of two (e.g., 64 instead of 66) whenever feasible.
Modulo Alternative
For small ranges, an if statement can be faster than the modulo operator:
uint modulo_func2(uint count) { if (++count >= 60) count = 0; return count; }Array Indexing
Replace chained switch or if‑else statements with a lookup array:
static char *classes = "WSU"; letter = classes[queue];Global Variables
Globals are never kept in registers; each access forces a load/store. Copy a global to a local variable inside tight loops to allow register allocation.
Aliases
Cache values that do not change:
void func1(int *data) { int i; int localdata = *data; for (i = 0; i < 10; i++) anyfunc(localdata, i); }Variable Live‑Range Splitting
Limit the number of simultaneously live variables so the compiler can keep more of them in registers. Split large functions, limit variable usage, or force register allocation where appropriate.
Variable Types
Use the smallest type that can hold the required range. Prefer int or unsigned int over char / short to avoid costly sign/zero extensions.
Local Variables
Avoid char or short locals; they require extra extension instructions. Use 32‑bit types for better performance.
Pointers
Pass large structures by pointer (or const pointer) to avoid copying large amounts of data onto the stack.
Pointer Chains
Cache intermediate pointers to reduce repeated dereferencing:
Point3 *pos = p->pos; pos->x = 0; pos->y = 0; pos->z = 0;Conditional Execution
Group related conditions in a single if expression so the compiler can evaluate them together.
Boolean Range Checks
Replace range checks with unsigned arithmetic:
return ((unsigned)(p.x - r->xmin) < r->xmax) && ((unsigned)(p.y - r->ymin) < r->ymax);Lazy Evaluation
Place the most likely true sub‑expression first in an && chain to allow short‑circuiting.
Switch vs. If‑Else
For multi‑branch decisions, a switch can be faster than a cascade of if‑else statements.
Binary Break
Replace long if‑else ladders with a binary search style hierarchy to reduce the number of comparisons.
Loop Optimisations
Termination : Use count‑down loops ( for(i=n; i!=0; i--)) because the test is a simple non‑zero check.
Fast For Loops : Decrementing counters ( for(i=10; i; i--)) are generally faster than incrementing ones.
Merge Loops : Separate loops may be faster than a single loop that does many unrelated tasks.
Function Loops : Move the loop inside the function to avoid repeated call overhead.
Loop Unrolling : Expand small fixed‑iteration loops to eliminate the loop‑control overhead, e.g., processing eight items per iteration.
Early Exit : Break out of a search loop as soon as the target is found to avoid unnecessary iterations.
Function Design
Keep functions small and leaf‑only when possible; leaf functions avoid extra register spills. Use __inline for performance‑critical small functions, but beware of code‑size growth.
Lookup Tables
Replace expensive calculations (e.g., sin/cos) with pre‑computed tables when exact precision is not required.
Floating‑Point Arithmetic
Float is faster than double; avoid division, use multiplication by pre‑computed reciprocals, and replace costly library functions with integer or fixed‑point equivalents when feasible.
Other Tips
Cache frequently used data instead of recomputing.
Avoid ++/-- in loop conditions when possible.
Minimise global variables; use static for file‑local scope.
Prefer single‑word variables (int, long) over smaller types.
Avoid recursion and heavy use of printf inside performance‑critical loops.
Store binary files instead of text when parsing speed matters.
Enable compiler optimisation flags for target architecture.
By applying these techniques, developers can achieve noticeable speed‑ups and memory savings, especially on embedded or mobile platforms where resources are limited.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
