Fundamentals 5 min read

Understanding the Physical Structure of Memory and the Root Cause of Memory Alignment

The article explains how memory chips are organized into banks and matrices, why eight consecutive bytes are distributed across eight banks for parallel I/O, and how this hardware design makes 64‑bit (8‑byte) alignment essential for optimal performance.

Refining Core Development Skills
Refining Core Development Skills
Refining Core Development Skills
Understanding the Physical Structure of Memory and the Root Cause of Memory Alignment

Memory Physical Structure

Most people are aware that memory alignment improves performance, but the deepest reason lies in the physical construction of memory. A memory module consists of many black memory chips; each chip contains eight banks, as shown in the diagram.

Figure 1: Physical appearance of a memory module

Each chip is built from eight banks. Inside each bank is a two‑dimensional matrix where every element stores one byte (8 bits).

Figure 2: Internal structure of a chip

Figure 3: Internal structure of a bank

Memory Addressing Method

When a program accesses eight consecutive bytes, e.g., 0x0000‑0x0007, one might assume they reside in the first bank, but actually each byte is stored in a different bank. Physically the bytes are not contiguous; the diagram below illustrates the real distribution.

Figure 4: Physical distribution of eight consecutive bytes

The reason is circuit efficiency: the eight banks can operate in parallel. To read 0x0000‑0x0007, each bank supplies one byte simultaneously, producing the full 8‑byte word in a single I/O operation. If the bytes were all in one bank, the reads would have to be serialized, requiring eight separate accesses and slowing performance.

Conclusion

Therefore, the deepest cause of memory alignment is that memory I/O works on 8‑byte (64‑bit) units. For a 64‑bit wide memory (and a 64‑bit CPU), each I/O fetch reads one byte from each of the eight banks and assembles them. Addresses 0‑7 can be fetched in one operation, as can 8‑15, and so on.

If you request a range that is not 8‑byte aligned, such as 0x0001‑0x0008, the memory controller must first read 0x0000‑0x0007, then 0x0008‑0x000F, and combine the results, which incurs extra latency. This hardware limitation explains why misaligned accesses are slower.

Extension 1: Compilers and linkers automatically align variables for developers, but they cannot achieve perfect alignment in every case. Extension 2: Beyond the hardware, the operating system manages CPU caches. A cache line is 64 bytes—eight times the memory I/O unit—so the OS and hardware together avoid wasted I/O cycles.
memory alignmentI/OComputer Architecturefundamentalsbanks
Refining Core Development Skills
Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.