Why Structs Beat Global Variables on Cortex‑A9: A Deep Dive into Assembly Efficiency
The article explains how encapsulating peripheral registers in a struct rather than using separate global variables reduces literal‑pool usage, cuts instruction count, and improves execution speed on Cortex‑A9, providing step‑by‑step assembly analysis, compilation commands, and further optimization techniques.
Many beginners keep peripheral registers as separate global variables for convenience, but this habit harms performance on Cortex‑A9 because the architecture relies on indirect addressing through a base register.
1. Global Variable Assembly
Using three independent int globals ( xx, yy, zz) results in each variable occupying 8 bytes in the .bss section and each access requiring three instructions, totaling twelve instructions for three reads/writes. The disassembly shows a literal pool entry of 4 bytes for each constant.
.text
.global _start
_start:
ldr sp,=0x70000000 /* get stack top pointer */
b main int xx=0;
int yy=0;
int zz=0;
int main(void){
xx=0x11;
yy=0x22;
zz=0x33;
while(1);
return 0;
} OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(_start)
SECTIONS{ . = 0x40008000; . = ALIGN(4);
.text : { gcd.o(.text) *(.text) }
. = ALIGN(4);
.rodata : { *(.rodata) }
. = ALIGN(4);
.data : { *(.data) }
. = ALIGN(4);
.bss : { *(.bss) }
} TARGET=gcd
TARGETC=main
all:
arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGETC).o $(TARGETC).c
arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGET).o $(TARGET).s
arm-none-linux-gnueabi-gcc -O1 -g -S -o $(TARGETC).s $(TARGETC).c
arm-none-linux-gnueabi-ld $(TARGETC).o $(TARGET).o -Tmap.lds -o $(TARGET).elf
arm-none-linux-gnueabi-objcopy -O binary -S $(TARGET).elf $(TARGET).bin
arm-none-linux-gnueabi-objdump -D $(TARGET).elf > $(TARGET).dis
clean:
rm -rf *.o *.elf *.dis *.binEach int global consumes 8 bytes, and the literal pool adds another 4 bytes.
2. Struct‑Based Assembly
Replacing the three globals with a single struct peng places the whole object in the .bss section at address 0x4000802c. Accessing members now shares the same base address, requiring only two instructions per member after the base is loaded.
struct {
int xx;
int yy;
int zz;
} peng;
int main(void){
peng.xx=0x11;
peng.yy=0x22;
peng.zz=0x33;
while(1);
return 0;
}Compared with three separate globals, the struct saves 8 bytes in the literal pool and reduces the total instruction count from twelve to seven.
All members share one literal‑pool entry, saving 8 bytes.
Only two instructions are needed after the base address is loaded, saving five instructions overall.
3. Further Optimization
By enabling size optimization and link‑time optimization in the Makefile, the compiler can emit a single stm store instruction that writes all three members at once, bringing the instruction count down to five.
TARGET=gcd
TARGETC=main
all:
arm-none-linux-gnueabi-gcc -Os -lto -g -c -o $(TARGETC).o $(TARGETC).c
arm-none-linux-gnueabi-gcc -Os -lto -g -c -o $(TARGET).o $(TARGET).s
arm-none-linux-gnueabi-gcc -Os -lto -g -S -o $(TARGETC).s $(TARGETC).c
arm-none-linux-gnueabi-ld $(TARGETC).o $(TARGET).o -Tmap.lds -o $(TARGET).elf
arm-none-linux-gnueabi-objcopy -O binary -S $(TARGET).elf $(TARGET).bin
arm-none-linux-gnueabi-objdump -D $(TARGET).elf > $(TARGET).dis
clean:
rm -rf *.o *.elf *.dis *.binThe final sequence loads the base address once, writes the three immediate values, and stores them with a single stm instruction.
Conclusion
Encapsulating peripheral registers in a struct on Cortex‑A9 dramatically reduces literal‑pool usage, cuts the number of required instructions, and improves CPU cycle efficiency. For performance‑critical low‑level code, struct‑based access is strongly recommended.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Liangxu Linux
Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
