Fundamentals 16 min read

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

This article explains the three main commercial server architectures—SMP, NUMA, and MPP—detailing their structures, performance characteristics, scalability limits, and suitability for OLTP versus data‑warehouse workloads, while also covering practical considerations such as virtualization and real‑world examples.

MaGe Linux Operations

Jul 3, 2022

Understanding SMP, NUMA, and MPP: Which Server Architecture Fits Your Needs?

1. SMP (Symmetric Multi‑Processor)

SMP systems contain tightly coupled processors that share all resources (bus, memory, I/O) and run a single operating system instance. All CPUs have equal access to memory and peripherals, and resource contention is resolved by hardware/software locking. Because all CPUs share the same memory bus, memory access becomes a bottleneck as CPU count grows, limiting effective scalability to about 2‑4 CPUs.

SMP servers are characterized by shared resources, which restricts expansion; the most efficient CPU utilization is typically observed with 2‑4 CPUs. An 8‑socket server marks the practical limit of SMP; beyond that, NUMA is required.

2. NUMA (Non‑Uniform Memory Access)

NUMA groups CPUs into modules, each with its own local memory and I/O. Modules are connected via an interconnect (e.g., Crossbar Switch). Access to local memory is fast, while remote memory access incurs higher latency, making it essential to minimize cross‑module communication in applications.

NUMA can support hundreds of CPUs in a single server (e.g., HP Superdome, SUN15K, IBM p690). However, performance does not scale linearly because remote memory accesses are slower; a 64‑CPU NUMA system may deliver only three times the performance of an 8‑CPU SMP system.

Modern CPUs (e.g., Intel Nehalem) integrate memory controllers, making NUMA awareness mandatory. Operating systems may enable Automatic NUMA balancing (Ubuntu 12.02, SUSE 12). Virtualization considerations include limiting KVM VMs to the CPUs within a single NUMA node and using vNUMA in VMware ESX to expose host NUMA topology to guests.

3. MPP (Massive Parallel Processing)

MPP builds a large system by interconnecting multiple independent SMP nodes via a high‑speed network. Each node accesses only its local memory and storage (share‑nothing architecture), allowing near‑linear scalability up to hundreds of nodes and thousands of CPUs.

MPP nodes run their own OS and databases; the complexity of load balancing and parallel execution is often hidden by database systems (e.g., NCR Teradata).

4. Comparing the Three Architectures

4.1 SMP vs. MPP

SMP suffers from a shared bus bottleneck, making it unsuitable for large‑scale decision‑support or data‑warehouse workloads, whereas MPP’s lack of shared resources allows better performance when inter‑node communication is low.

4.2 NUMA vs. MPP

Both use multiple nodes, but NUMA’s interconnect is internal to a single chassis, leading to latency when accessing remote memory, while MPP’s external network enables independent local memory access and near‑linear scaling.

4.3 Performance Differences

NUMA can scale to hundreds of CPUs but suffers from remote memory latency; MPP can scale to hundreds of nodes with linear performance growth; SMP offers limited scalability, typically optimal with 2‑4 CPUs.

4.4 Expansion Limits

NUMA and MPP theoretically support unlimited expansion (hundreds of CPUs or thousands of CPUs respectively), while SMP’s expansion is poor, though IBM’s BOOK technology can extend SMP to 8 CPUs.

4.5 Application Suitability

MPP excels in decision‑support and data‑mining workloads where communication overhead is low.

SMP provides higher efficiency for workloads with heavy inter‑process communication.

NUMA offers strong OLTP transaction processing but degrades for data‑warehouse workloads due to extensive cross‑module data exchange.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

scalability parallel processing Server Architecture numa MPP SMP

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.