Fundamentals 7 min read

Understanding CPU‑Memory Interconnects: From FSB to NUMA and Practical Linux Tests

The article explains how modern servers with multiple CPUs and memory modules connect via legacy Front Side Bus and modern NUMA architectures, demonstrates Linux commands to inspect memory topology, and presents benchmark results showing latency differences between intra‑node and inter‑node memory accesses.

Refining Core Development Skills

Nov 7, 2019

Understanding CPU‑Memory Interconnects: From FSB to NUMA and Practical Linux Tests

Modern servers typically contain many CPU cores and dozens of gigabytes of memory spread across multiple DIMMs, raising the question of how these components are interconnected and whether latency varies when a CPU accesses different memory modules.

Historically, CPUs communicated with memory through the Front Side Bus (FSB) architecture, where the CPU connected to a north‑bridge chip via the FSB, and the north‑bridge housed the memory controller. All traffic between CPU and RAM passed through this single bus.

As CPU frequencies surpassed 3 GHz, manufacturers shifted to multi‑core and multi‑CPU designs, making the FSB a bottleneck. The memory controller moved onto the CPU die, and a new interconnect called QuickPath Interconnect (QPI) was introduced to allow CPUs to access memory attached to other sockets, forming a Non‑Uniform Memory Access (NUMA) architecture.

In a NUMA system each CPU belongs to a node that owns a subset of the system’s memory. The article shows how to inspect the hardware layout on Linux using # dmidecode|grep -P -A5 "Memory\s+Device"|grep Size to list memory slots and # numactl --hardware to display node‑CPU and node‑memory assignments, including node distance matrices that indicate latency between nodes.

Two benchmark scenarios are presented using numactl with --cpubind and --membind to force a process to run on a specific node and allocate memory from a specific node. The latency tables show that accesses to memory within the same node (distance 10) are faster than accesses to memory on a remote node (distance 21), confirming the NUMA performance characteristic.

Conclusion: In contemporary servers the CPU‑memory interconnect is a complex NUMA architecture that groups CPUs and memory into nodes; intra‑node memory accesses are noticeably faster than inter‑node accesses because the latter must traverse the QPI bus.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance Linux CPU memory numa

Written by

Refining Core Development Skills

Fei has over 10 years of development experience at Tencent and Sogou. Through this account, he shares his deep insights on performance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.