FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages
This article provides a comprehensive overview of FPGA technology, detailing its programmable logic cells, input/output blocks, switch matrices, historical evolution, flexibility versus ASIC and GPU, memory hierarchy including on‑chip and HBM2e, and the benefits of Network‑on‑Chip architectures for performance, power and design modularity.
FPGA (Field‑Programmable Gate Array) is a silicon chip with programmable features that can be reconfigured via software to implement various circuit functions, often called a "universal" chip.
FPGA chips consist of three main parts: programmable logic cells (Logic Cell, LC), input/output blocks (Input Output Block, IO), and switch box arrays (Switch Box, SB).
(1) Logic cells use Look‑up Tables (LUTs) – static RAM structures – to store binary data that defines circuit functions. Common LUT sizes include 4‑input (LUT4), 5‑input (LUT5) and 6‑input (LUT6). Larger LUTs increase logic capacity but also area exponentially.
(2) Input/Output blocks serve as the interface between the chip and external circuits, handling signal driving and matching requirements.
(3) Switch matrices use MOS transistors to control routing paths, enabling flexible interconnections.
Since Xilinx introduced the first FPGA (XC2064) in 1985, FPGA hardware has progressed through four stages: PROM → PAL/GAL → CPLD/FPGA → modern SoC FPGA/eFPGA, moving toward larger scale, higher flexibility, and better performance.
FPGA belongs to the logical chip family, which includes general‑purpose processors (CPU, GPU, DSP), memory chips, ASICs, and FPGA itself.
FPGA combines flexibility and parallelism. Flexibility allows users to reprogram internal connections for any logic function, reducing investment risk in fast‑evolving domains. Parallelism enables each logic cell to operate simultaneously without the instruction‑decode and shared‑memory bottlenecks of CPUs/GPUs, offering higher efficiency for large‑scale data processing.
Compared with GPUs, FPGA consumes far less power (≈10 W vs. 200 W) while providing comparable performance, and its reprogrammable nature offers greater adaptability for AI workloads.
Compared with ASICs, FPGA offers short development cycles, low cost, and the ability to reconfigure hardware, making it ideal for prototyping and rapid design iteration.
FPGA memory hierarchy (using Intel Agilex‑M as an example) includes on‑chip memory (MLAB, M20K), packaged high‑bandwidth memory (HBM2e), and external DDR5/LPDDR5. HBM2e stacks provide 2 GB per layer, enabling 16‑32 GB total capacity with bandwidth up to 410 Gbps per stack (820 Gbps combined).
Integrating HBM2e within the package reduces I/O pins, board space, power consumption, and latency.
The on‑chip network (NoC – Network on Chip) interconnects programmable logic (PL), processing system (PS), and IP blocks, providing high‑speed data transfer via AXI channels (256‑bit × 2 GHz ≈ 512 Gbps per direction).
NoC offers several advantages for FPGA design: improved performance, reduced routing congestion, lower power consumption, simplified high‑speed interface management, and true modular design.
Intel (Altera) and AMD (Xilinx) leverage NoC to achieve high‑bandwidth memory transfers between HBM2e and programmable logic, using UIB and IO96 subsystems, and to integrate AI engines with programmable logic, boosting compute density up to 8× and reducing power by 40%.
Download links and promotional material for related reports and e‑books are provided throughout the article.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.