Fundamentals 18 min read

Decoding Chip Concepts: CPU, GPU, NPU, APU, SoC, HBM & Chiplet (2026)

This article breaks down the core chip concepts—CPU, GPU, NPU, APU, SoC, HBM and Chiplet—explaining their functions, key characteristics, historical evolution, and how they relate to each other, and provides a 2026 mainstream‑chip comparison and selection guide.

Lao Guo's Learning Space

Jun 2, 2026

Decoding Chip Concepts: CPU, GPU, NPU, APU, SoC, HBM & Chiplet (2026)

CPU – General‑purpose processor

Full name: Central Processing Unit.

Key characteristics

Core count typically 4–64 for desktop/servers.

Strengths: logic, branch prediction, task scheduling.

Weakness: limited large‑scale parallelism.

Representative products: Intel Core i9, AMD Ryzen 9, Apple M5.

Architectures: x86 (Intel/AMD), ARM (Apple/Qualcomm/MediaTek).

Internal components

ALU – arithmetic‑logic operations.

CU – instruction decode and control flow.

Cache hierarchy (L1/L2/L3) – fast, progressively larger storage.

Registers – fastest storage directly attached to execution units.

Why a CPU cannot replace a GPU – CPUs follow a “fast‑and‑smart” design: each core handles complex logic but the core count is limited, analogous to a professor solving difficult problems for one student at a time. GPUs follow a “many‑and‑fast” design: thousands of simple cores work in parallel, analogous to many elementary students each doing simple addition simultaneously.

GPU – Parallel‑compute engine

Full name: Graphics Processing Unit.

Key characteristics

Core count ranges from thousands to tens of thousands (CUDA cores / shaders).

Strengths: massive parallel tasks such as matrix multiplication and graphics rendering.

Weakness: complex logical branching.

Representative products: NVIDIA RTX 5090, RTX Spark, Apple M5 GPU.

Key metrics: CUDA core count, memory capacity/bandwidth, TFLOPS.

GPU evolution

Pre‑1999: Fixed‑function rasterization pipeline.

2001: Programmable shaders (GeForce 3) enable custom rendering.

2006: CUDA released, enabling general‑purpose (GPGPU) computing.

2012: AlexNet uses GPU for training, sparking the AI boom.

2017: Volta introduces Tensor Cores for matrix acceleration.

2018+: RTX series adds real‑time ray tracing and DLSS.

2025: Blackwell architecture introduces FP4 precision, doubling AI inference efficiency.

GPU core units

CUDA cores – general parallel compute units (NVIDIA terminology).

Tensor Cores – matrix‑multiply‑accumulate units optimized for AI training and inference.

RT Cores – ray‑tracing acceleration.

Shaders – AMD/Intel term for programmable compute units.

NPU – AI‑specific inference accelerator

Full name: Neural Processing Unit.

Key characteristics

Specialized for neural‑network inference (convolution, matrix multiply, activation).

Power consumption in the milliwatt range on mobile devices.

Flexibility limited to AI ops; cannot perform general computation.

Representative products: Apple Neural Engine, Qualcomm Hexagon NPU, Intel NPU.

Typical scenarios: phone face‑unlock, AI photo processing, local PC Copilot.

Comparison with GPU

Positioning: AI inference‑only vs. general parallel compute + AI.

Flexibility: Low (only neural nets) vs. high (programmable).

Energy efficiency: Very high vs. medium.

Performance ceiling: Medium vs. very high.

Typical devices: Phones, thin laptops vs. desktops, servers, data‑center rigs.

AI training support: No vs. yes.

APU – AMD’s integrated CPU + GPU solution

Full name: Accelerated Processing Unit.

Key points

Defined as CPU + GPU integrated in a single package.

Introduced by AMD in 2011 to reduce cost and power while meeting everyday workloads.

Evolved from AMD APU → AMD Ryzen APU and eventually merged into the mainstream Ryzen product line.

Because modern CPUs from Intel and AMD now embed GPUs, the term “APU” has largely faded.

SoC – System on Chip

Full name: System on Chip.

Key characteristics

Integrates CPU, GPU, NPU, ISP, memory controller, network baseband, and other modules on a single silicon die.

Advantages: low power, small form factor, ultra‑low communication latency.

Disadvantages: difficult to upgrade because components are soldered to the board.

Modern SoC example (Apple M5 Pro)

CPU cores – performance + efficiency clusters.

GPU cores – graphics + parallel compute.

Neural Engine – AI inference (NPU).

ISP – image‑signal processing for cameras.

Media Engine – video encode/decode acceleration.

Memory controller – unified memory management.

Secure Enclave – secure storage/encryption.

Thunderbolt/NIC – external I/O.

Representative SoC products

Apple M5 series (MacBook, iPad, Mac Studio).

Qualcomm Snapdragon X Elite (Windows‑on‑ARM laptops).

NVIDIA RTX Spark – first NVIDIA SoC with CPU + GPU + 128 GB unified memory.

MediaTek Dimensity 9400 (Android flagship).

Huawei Kirin 9020 (smartphone flagship).

HBM – High‑Bandwidth Memory

Full name: High Bandwidth Memory.

Core innovation – 3‑D stacking of DRAM dies connected by through‑silicon vias (TSV).

Bandwidth

HBM3E: 4–5 TB/s, 8–16 GB per stack.

HBM4: 6–8 TB/s, capacity higher than HBM3E.

Power and cost

More power‑efficient than GDDR.

Very high manufacturing cost.

Main users – AI/HPC GPUs (e.g., NVIDIA H100, B200) and high‑end graphics cards.

HBM vs. traditional memory

GDDR7: 1–2 TB/s, 16–32 GB, targeted at gaming graphics.

LPDDR5X: 0.1 TB/s, 16–128 GB, used in phones and thin laptops.

Why AI chips need HBM – Large‑model training and inference are often limited by memory bandwidth rather than compute; HBM provides a “wide conveyor belt” so the GPU’s compute capacity is not starved of data.

HBM stacking principle

Vertically stack multiple DRAM dies.

Connect layers via TSV.

Attach the stack to the GPU substrate with micro‑bumps.

Typical configuration: 4–8 layers, each 2 GB, total 8–16 GB.

Additional related concepts

TPU – Tensor Processing Unit

Vendor: Google (custom).

Specialty: AI training + inference, optimized for TensorFlow; does not support general compute.

Representative: Google TPU v5p (used for Gemini large‑model training).

LPU – Language Processing Unit

Vendor: Groq.

Specialty: LLM inference with extremely low latency.

Features: Uses SRAM instead of HBM, enabling very fast inference for 1–2 large models; high cost and limited capacity.

FPGA – Field‑Programmable Gate Array

Programmable hardware circuits, high flexibility.

Typical use cases: prototype verification, low‑latency trading, network acceleration.

Representatives: AMD Xilinx, Intel Altera.

DSP – Digital Signal Processor

Specialty: audio/video codec and communication signal processing.

Typical scenarios: phone call noise reduction, Bluetooth audio, 5G baseband.

Current status: functionality largely integrated into SoCs, no longer a separate component.

Unified Memory

Concept: CPU and GPU share the same physical memory pool.

Benefit: eliminates data copies, resulting in ultra‑low latency.

Representatives: Apple Silicon (128 GB), NVIDIA RTX Spark (128 GB).

Contrast: traditional systems have separate DDR for CPU and dedicated graphics memory, requiring PCIe transfers.

Chiplet

Concept: large chips are divided into multiple smaller dies that are manufactured separately and then packaged together.

Advantages: higher yield, lower cost, flexible composition.

Representatives: AMD Ryzen (CCD + IOD), Intel Meteor Lake.

Vs SoC: a SoC is a monolithic die; a chiplet architecture assembles several small dies.

2026 mainstream chip comparison (selected highlights)

Apple M5 Pro – Laptop/desktop, ARM 12‑core CPU, 20‑core GPU, 18‑core NPU, 128 GB unified memory, SoC‑based.

Apple M5 Ultra – Desktop, ARM 32‑core CPU, 80‑core GPU, 64‑core NPU, 512 GB unified memory, SoC‑based.

NVIDIA RTX Spark – Laptop/mini‑PC, ARM 20‑core Grace CPU, 6144 CUDA cores, built‑in NPU, 128 GB unified memory, first NVIDIA SoC.

NVIDIA RTX 5090 – Desktop GPU, 21760 CUDA cores, 32 GB GDDR7, no integrated SoC.

NVIDIA B200 – Data‑center accelerator, Blackwell GPU, 192 GB HBM3e, no CPU.

Snapdragon X Elite – Notebook, ARM 12‑core CPU, Adreno GPU, 45 TOPS NPU, 64 GB LPDDR5X, SoC‑based.

Intel Core Ultra 9 – Notebook, x86 24‑core CPU, Arc GPU, 48 TOPS NPU, 64 GB LPDDR5X, SoC‑based.

AMD Ryzen 9 9950X – Desktop, x86 16‑core CPU, Radeon GPU, 128 GB DDR5, built with chiplet architecture (not a SoC).

Key finding : The 2026 trend is clear – SoC + large unified memory + integrated NPU is becoming the mainstream architecture across Apple, NVIDIA, Qualcomm, Intel and others.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CPU hardware GPU NPU SoC HBM Chiplet

Written by

Lao Guo's Learning Space

AI learning, discussion, and hands‑on practice with self‑reflection

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

CPU – General‑purpose processor

GPU – Parallel‑compute engine

NPU – AI‑specific inference accelerator

APU – AMD’s integrated CPU + GPU solution

SoC – System on Chip

HBM – High‑Bandwidth Memory

Additional related concepts

TPU – Tensor Processing Unit

LPU – Language Processing Unit

FPGA – Field‑Programmable Gate Array

DSP – Digital Signal Processor

Unified Memory

Chiplet

2026 mainstream chip comparison (selected highlights)

Lao Guo's Learning Space

How this landed with the community

Was this worth your time?

0 Comments

APU – AMD’s integrated CPU + GPU solution