Understanding DMA and the RIFFA Architecture: Block vs Scatter‑Gather
This article explains DMA fundamentals, compares Block DMA and Scatter‑Gather DMA step by step, and evaluates the open‑source RIFFA PCIe framework, including its hardware flow, software components, board‑level performance tests, features, drawbacks, and licensing terms.
DMA Overview
High‑speed peripherals (SSD, TCP/IP offload engines, NICs, GPUs, accelerators) require data transfer rates that exceed CPU‑mediated paths. Direct Memory Access (DMA) moves data between peripherals and host memory without CPU intervention.
DMA Classification
Commercial DMA IP blocks (e.g., Xilinx XDMA) bundle controller logic and driver, reducing development effort but limiting customisation. Two main DMA types are Block DMA and Scatter‑Gather DMA.
Block DMA
Block DMA transfers a single contiguous buffer.
Driver requests a host memory region of the required size; the physical address must be contiguous.
Driver copies application data into the DMA buffer.
Driver programs base‑address, length, and control registers to start the DMA read, handing bus control to the DMA controller.
DMA controller issues a Memory Read (MRd) request to the Root Complex (RC) using the programmed address and length.
RC returns the data in a Completion with Data (CplD) packet.
When the received data volume equals the requested amount, the controller raises a hard‑wired INT interrupt.
Driver reads the interrupt register via a PIO read.
Controller clears the interrupt and its flag.
Driver interprets the flag as DMA‑read‑complete.
Driver releases the DMA buffer, completing the operation.
Consequences: (1) DMA buffers must reside at contiguous physical addresses, making large allocations difficult and often requiring multiple DMA transfers, which reduces PCIe bandwidth utilisation. (2) The driver‑controller interaction is simple but blocking; a new DMA cannot be configured until the current one finishes, limiting parallelism.
Scatter‑Gather DMA
Scatter‑Gather DMA (SG DMA) links non‑contiguous memory regions into a descriptor list, improving host‑memory utilisation and reducing CPU interrupt load.
Driver decides total transfer size and allocates multiple smaller contiguous DMA buffers (e.g., sixteen 4 KB buffers for a 64 KB transfer) and links them into an SG list.
Driver copies data from the upper‑layer protocol stack into the scattered DMA buffers.
Driver allocates an SG buffer to hold the descriptor list; each entry contains the base address and length of one DMA buffer.
Driver writes the SG list into the SG buffer.
Driver programs the SG buffer address and length into the DMA controller registers and starts the DMA read.
Controller reads the SG list from the RC via an MRd request.
RC returns the SG list wrapped in a CplD packet.
Controller raises an INT interrupt after receiving the SG list.
Driver reads the interrupt register.
Controller clears the interrupt flag.
Driver recognises the interrupt as SG‑read‑complete and frees the SG buffer.
Controller parses the SG list and sequentially reads each DMA buffer described by the entries.
For each entry, the controller issues an MRd to fetch the corresponding DMA buffer; after a buffer is exhausted, the next entry is processed.
RC returns each fetched buffer in a CplD packet.
When the total received data matches the SG list total, the controller raises a final INT interrupt.
Driver reads the final interrupt.
Controller clears the interrupt flag.
Driver interprets the flag as DMA‑read‑complete, frees all DMA buffers, and ends the operation.
Compared with Block DMA, SG DMA adds a descriptor‑reading phase, increasing driver‑hardware complexity but improving memory utilisation and lowering CPU interrupt overhead.
RIFFA Architecture
RIFFA (Reusable Integration Framework for FPGA Accelerators) is an open‑source PCIe communication framework that enables real‑time data exchange between a host CPU and FPGA IP cores. It provides Linux and Windows driver layers, supports Xilinx and Altera PCIe IP, and offers user‑space libraries for C/C++, Python, LabVIEW, MATLAB, and Java.
The RIFFA Linux driver directory contains six C source files: riffa_driver.c, riffa_driver.h, circ_queue.c, circ_queue.h, riffa.c, and riffa.h. circ_queue implements a kernel‑mode message queue for synchronising interrupts and processes. riffa_driver files constitute the core driver, while riffa.c/h provide the user‑space API.
Board‑Level Test
RIFFA’s SG DMA controller was evaluated on Xilinx platforms xc7k325tffg676‑2 (K7) and xcku040‑ffva1156‑2‑i (KU040), both supporting PCIe Gen3 × 8 (theoretical 64 Gbps). Measured transfer rate was 3.5 GB/s on each board, corresponding to ≈87.5 % utilisation of the PCIe link bandwidth (3.5 GB × 8 Gbit/s / (40 × 0.8)).
RIFFA Features
Supports Xilinx A7/K7 PCIe Integrated IP, Xilinx Ultrascale Integrated IP, and Altera Stratix IV, Cyclone IV, Arria II IPs with 64/128‑bit AXI‑Stream widths.
Up to 5 FPGAs per host and 12 concurrent transfer channels per RIFFA instance.
Driver layer works on Windows and Linux; user‑space libraries for LabVIEW, C/C++, Python, MATLAB, and Java.
Uses DMA and interrupt signalling to achieve high PCIe bandwidth.
Object‑oriented driver implementation.
RIFFA Drawbacks
Project has not been updated since its 2016 open‑source release, leaving it unmaintained.
Does not support 256‑bit data paths or PCIe Gen3 × 8 speeds.
Lacks a control interface, limiting system‑integration flexibility.
Missing design‑for‑test (DFT) features make IP debugging difficult.
Only supports DWORD‑aligned transfers; byte‑aligned transfers are unavailable.
No support for mmap‑based transfers.
No provided simulation environment.
License
RIFFA is released under a permissive license (© 2016 University of California Board of Regents). Redistribution of source code must retain the copyright notice and disclaimer; binary redistribution must reproduce the notice and disclaimer. The name of the University or its contributors may be used for endorsement without additional permission.
Linux Code Review Hub
A professional Linux technology community and learning platform covering the kernel, memory management, process management, file system and I/O, performance tuning, device drivers, virtualization, and cloud computing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
