Fundamentals 9 min read

Why Structs Beat Global Variables on Cortex‑A9: A Deep Dive into Assembly Efficiency

The article explains how encapsulating peripheral registers in a struct rather than using separate global variables reduces literal‑pool usage, cuts instruction count, and improves execution speed on Cortex‑A9, providing step‑by‑step assembly analysis, compilation commands, and further optimization techniques.

Liangxu Linux

Jul 7, 2024

Why Structs Beat Global Variables on Cortex‑A9: A Deep Dive into Assembly Efficiency

Many beginners keep peripheral registers as separate global variables for convenience, but this habit harms performance on Cortex‑A9 because the architecture relies on indirect addressing through a base register.

1. Global Variable Assembly

Using three independent int globals ( xx, yy, zz) results in each variable occupying 8 bytes in the .bss section and each access requiring three instructions, totaling twelve instructions for three reads/writes. The disassembly shows a literal pool entry of 4 bytes for each constant.

.text
.global _start
_start:
  ldr sp,=0x70000000   /* get stack top pointer */
  b main

int xx=0;
int yy=0;
int zz=0;
int main(void){
  xx=0x11;
  yy=0x22;
  zz=0x33;
  while(1);
  return 0;
}

OUTPUT_FORMAT("elf32-littlearm", "elf32-littlearm", "elf32-littlearm")
OUTPUT_ARCH(arm)
ENTRY(_start)
SECTIONS{ . = 0x40008000; . = ALIGN(4);
 .text : { gcd.o(.text) *(.text) }
 . = ALIGN(4);
 .rodata : { *(.rodata) }
 . = ALIGN(4);
 .data : { *(.data) }
 . = ALIGN(4);
 .bss : { *(.bss) }
}

TARGET=gcd
TARGETC=main
all:
  arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGETC).o $(TARGETC).c
  arm-none-linux-gnueabi-gcc -O1 -g -c -o $(TARGET).o $(TARGET).s
  arm-none-linux-gnueabi-gcc -O1 -g -S -o $(TARGETC).s $(TARGETC).c
  arm-none-linux-gnueabi-ld $(TARGETC).o $(TARGET).o -Tmap.lds -o $(TARGET).elf
  arm-none-linux-gnueabi-objcopy -O binary -S $(TARGET).elf $(TARGET).bin
  arm-none-linux-gnueabi-objdump -D $(TARGET).elf > $(TARGET).dis
clean:
  rm -rf *.o *.elf *.dis *.bin

Each int global consumes 8 bytes, and the literal pool adds another 4 bytes.

2. Struct‑Based Assembly

Replacing the three globals with a single struct peng places the whole object in the .bss section at address 0x4000802c. Accessing members now shares the same base address, requiring only two instructions per member after the base is loaded.

struct {
  int xx;
  int yy;
  int zz;
} peng;
int main(void){
  peng.xx=0x11;
  peng.yy=0x22;
  peng.zz=0x33;
  while(1);
  return 0;
}

Compared with three separate globals, the struct saves 8 bytes in the literal pool and reduces the total instruction count from twelve to seven.

All members share one literal‑pool entry, saving 8 bytes.

Only two instructions are needed after the base address is loaded, saving five instructions overall.

3. Further Optimization

By enabling size optimization and link‑time optimization in the Makefile, the compiler can emit a single stm store instruction that writes all three members at once, bringing the instruction count down to five.

TARGET=gcd
TARGETC=main
all:
  arm-none-linux-gnueabi-gcc -Os -lto -g -c -o $(TARGETC).o $(TARGETC).c
  arm-none-linux-gnueabi-gcc -Os -lto -g -c -o $(TARGET).o $(TARGET).s
  arm-none-linux-gnueabi-gcc -Os -lto -g -S -o $(TARGETC).s $(TARGETC).c
  arm-none-linux-gnueabi-ld $(TARGETC).o $(TARGET).o -Tmap.lds -o $(TARGET).elf
  arm-none-linux-gnueabi-objcopy -O binary -S $(TARGET).elf $(TARGET).bin
  arm-none-linux-gnueabi-objdump -D $(TARGET).elf > $(TARGET).dis
clean:
  rm -rf *.o *.elf *.dis *.bin

The final sequence loads the base address once, writes the three immediate values, and stores them with a single stm instruction.

Conclusion

Encapsulating peripheral registers in a struct on Cortex‑A9 dramatically reduces literal‑pool usage, cuts the number of required instructions, and improves CPU cycle efficiency. For performance‑critical low‑level code, struct‑based access is strongly recommended.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance C#Assembly struct Cortex-A9

Written by

Liangxu Linux

Liangxu, a self‑taught IT professional now working as a Linux development engineer at a Fortune 500 multinational, shares extensive Linux knowledge—fundamentals, applications, tools, plus Git, databases, Raspberry Pi, etc. (Reply “Linux” to receive essential resources.)

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.