Tagged articles

Memory Bandwidth

8 articles · Page 1 of 1

May 16, 2026 · Artificial Intelligence

Why More Compute Can't Fix LLM Inference Lag and Why RL Leads to Overtraining

In a deep interview, former Google TPU architect Reiner Pope explains that low‑concurrency fast‑mode services trade higher fees for faster streaming but are limited by memory‑bandwidth bottlenecks, that optimal concurrency balances compute and memory costs, and that pipeline‑parallel sparse expert models and reinforcement‑learning fine‑tuning introduce new inefficiencies and overtraining risks.

LLMMemory BandwidthOvertraining

0 likes · 7 min read

Why More Compute Can't Fix LLM Inference Lag and Why RL Leads to Overtraining

AI Frontier Lectures

Jan 12, 2026 · Industry Insights

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

The article analyses the challenges of large‑language‑model inference, highlighting memory bandwidth and interconnect as the primary bottlenecks, and presents four research opportunities—high‑bandwidth flash, processing‑near‑memory, 3D memory‑logic stacking, and low‑latency interconnect—while evaluating current Nvidia solutions and proposing integrated architectural approaches.

3D stackingAI hardware researchLLM Inference

0 likes · 22 min read

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

Architects' Tech Alliance

Mar 31, 2025 · Industry Insights

GPGPU vs ASIC: Who Wins the AI Compute Race?

This article analyzes the trade‑offs between GPGPU and ASIC for AI workloads, covering precision, compute density, power efficiency, memory bandwidth, interconnect technologies like NVLink, and the strategic reasons why leading firms are investing in custom AI chips.

AI chipsASICGPGPU

0 likes · 8 min read

GPGPU vs ASIC: Who Wins the AI Compute Race?

Architects' Tech Alliance

Mar 30, 2025 · Industry Insights

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

The article analyzes the rapid growth of AI model memory and compute demands, the slow increase of chip memory capacity, and argues that memory bandwidth and energy consumption, rather than raw compute, will dominate AI chip design, emphasizing multi‑tenancy, DSA flexibility, and data‑flow optimization.

AI chipsDSAMemory Bandwidth

0 likes · 7 min read

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

Architects' Tech Alliance

Mar 13, 2025 · Fundamentals

How Memory Bandwidth and Latency Shape CPU Performance

The article explains how CPU computation latency arises from memory speed, bandwidth, and access delays, detailing the relationships among memory, bandwidth, and latency, and examines key factors such as clock frequency, pipelining, parallelism, cache hit rate, and signal propagation distances that together determine overall system performance.

CPUComputer ArchitectureLatency

0 likes · 9 min read

How Memory Bandwidth and Latency Shape CPU Performance

Architects' Tech Alliance

Sep 23, 2024 · Artificial Intelligence

Venado Supercomputer: Architecture, Performance, and Design Insights

The Venado supercomputer, built for Los Alamos National Laboratory, combines Nvidia Grace CPUs with Hopper GPUs, leverages high‑bandwidth memory and Slingshot interconnects, and targets a balanced 80/20 CPU‑GPU workload split to support demanding AI and HPC applications.

Grace CPUHPCLos Alamos

0 likes · 13 min read

Venado Supercomputer: Architecture, Performance, and Design Insights

Architects' Tech Alliance

Jul 23, 2024 · Industry Insights

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

The Venado supercomputer, unveiled at Los Alamos, combines Nvidia Grace CPUs, Hopper GPUs, HPE Slingshot interconnects, and massive memory bandwidth to achieve a 15.6‑petaflop FP64 peak, illustrating the evolving balance between CPU and GPU workloads in modern high‑performance computing.

CPUGPUGrace

0 likes · 14 min read

Inside Los Alamos’ Venado Supercomputer: Architecture, Performance, and HPC Trends

Alibaba Cloud Infrastructure

Mar 24, 2021 · Cloud Computing

LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021

The article reviews two HPCA 2021 papers from Alibaba Cloud—LIBRA, a dynamic memory‑bandwidth management framework that boosts data‑center utilization, and CARE, a cache‑based fault‑tolerance architecture that delivers near‑Chipkill reliability with minimal overhead—while also highlighting future research directions in ML systems, quantum computing, and cache computing.

Cloud ComputingHPCA2021Memory Bandwidth

0 likes · 4 min read

LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021