In‑Memory Computing: Overcoming the Memory Wall for AI Chips
The article explains how the memory‑wall limitation of traditional von Neumann architectures hampers AI chip performance, describes two in‑memory computing approaches—circuit‑level modifications and new memory devices—highlights recent conference trends, and showcases a Chinese startup’s 8‑bit low‑power in‑memory AI chip that could enable ubiquitous AI on edge devices.
As artificial intelligence moves from research to large‑scale deployment, AI chips have become a distinct class of processors, prized for their high compute density and energy efficiency compared with conventional chips. Early AI chip designs focused on parallelism, but the rapid exploitation of parallel potential has led to a new bottleneck known as the “memory wall.”
The memory wall originates from the classic von Neumann architecture, where computation and memory are separate units. While transistor scaling has dramatically increased processor speed, DRAM memory has improved far more slowly, creating a disparity where data movement dominates both latency and energy consumption, especially for neural‑network workloads that require massive data transfers.
To mitigate this, researchers propose “in‑memory computing” (IMC), which performs computation directly within the storage array, thereby shortening the distance between data and logic. Two primary technical routes exist: (1) modest circuit‑level changes to existing SRAM or DRAM arrays, exemplified by a 2018 MIT ISSCC paper that accelerates convolution via analog weighting; and (2) the introduction of novel memory devices (e.g., ReRAM, PLRAM) that embed compute capabilities within the memory cell array, allowing the processor to send inputs and receive results without intermediate data shuttling.
Industry interest in IMC has grown steadily. Since 2018, the ISSCC conference has featured a dedicated session on IMC, publishing five papers that year and increasing to seven by 2020. The IEDM conference also hosts multiple IMC sessions, reflecting a broader shift toward post‑Moore’s‑law device innovations.
A notable breakthrough presented at IEDM comes from the Chinese startup Flash Semiconductor (闪亿半导体) in collaboration with Zhejiang University, Peking University, and Huahong Macroelectronics. Their work introduces a new storage device called PLRAM, enabling an 8‑bit precision in‑memory AI accelerator that operates at a peak power of only 9 mW while delivering up to 30 GOPS, sufficient for IoT and wearable voice‑recognition tasks. This precision surpasses the 2‑3 bit limits of many ReRAM‑based solutions and addresses the longstanding accuracy‑vs‑application trade‑off in IMC.
Beyond the device, Flash Semiconductor has developed large‑scale resistive array drivers and a dedicated instruction‑set architecture to reduce digital‑analog conversion overhead and improve overall system efficiency. These advances, combined with a focus on low‑power edge markets, illustrate how IMC can transition from academic research to commercial products, potentially accelerating the adoption of AI everywhere.
Overall, the article underscores that overcoming the memory wall through in‑memory computing is a critical step for the next generation of AI hardware, especially as the semiconductor industry seeks new pathways beyond the end of Moore’s law.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.