Error‑Correcting Code (ECC) Schemes in DDR Memory
The article explains how DDR and LPDDR memory systems use various error‑correcting code (ECC) schemes—including side‑band, inline, on‑die, and link ECC—to provide reliability, availability, and maintainability (RAS) protection against hard and soft memory errors in modern computing devices.
Double‑data‑rate synchronous dynamic random‑access memory (DDR SDRAM) is now the main memory for most applications, from high‑performance computing to power‑constrained mobile devices, thanks to its high density, simple architecture, low latency, and low power consumption. JEDEC defines four DRAM categories: standard DDR, mobile DDR (LPDDR), graphics DDR (GDDR), and high‑bandwidth DRAM (HBM).
The memory subsystem in a typical system‑on‑chip (SoC) consists of a DDR controller, PHY, channels, and DRAM modules, and can suffer hard errors (design faults) or soft errors (noise, alpha particles). End‑to‑end protection with reliability, availability, and maintainability (RAS) features is essential.
ECC (Error‑Correcting Code) is the most common RAS technique. DDR controllers generate SECDED (single‑error‑correction, double‑error‑detection) codes for each write and store them alongside data. During reads, the controller recomputes ECC and compares it to the stored code, correcting single‑bit errors and detecting double‑bit errors.
Two main ECC deployment styles exist:
Side‑band ECC : ECC data is stored in separate DRAM chips (e.g., DDR4/5 ECC DIMMs with 72‑bit width). The controller reads/writes data and ECC simultaneously without extra commands, incurring minimal latency.
Inline ECC : ECC bits are interleaved with the data within the same DRAM channel (common in LPDDR). This requires additional write/read commands for ECC, potentially increasing latency, but avoids separate ECC chips.
Newer DDR generations add advanced ECC features:
On‑die ECC (DDR5): Each 128‑bit data block includes an 8‑bit ECC stored in dedicated on‑die memory, correcting single‑bit errors within the DRAM array.
Link ECC (LPDDR5): ECC is transmitted over the memory link, allowing the controller and DRAM to correct single‑bit errors on the channel, complementing inline ECC for end‑to‑end protection.
These ECC schemes collectively enhance memory subsystem stability, enabling systems to continue operating despite correctable errors while logging uncorrectable ones for debugging.
In conclusion, side‑band ECC is typical for standard DDR servers, while inline ECC is used for LPDDR applications; DDR5 and LPDDR5 further support on‑die and link ECC respectively, providing comprehensive RAS capabilities.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.