Tagged articles
8 articles
Page 1 of 1
Machine Heart
Machine Heart
May 16, 2026 · Artificial Intelligence

Why More Compute Can't Fix LLM Inference Lag and Why RL Leads to Overtraining

In a deep interview, former Google TPU architect Reiner Pope explains that low‑concurrency fast‑mode services trade higher fees for faster streaming but are limited by memory‑bandwidth bottlenecks, that optimal concurrency balances compute and memory costs, and that pipeline‑parallel sparse expert models and reinforcement‑learning fine‑tuning introduce new inefficiencies and overtraining risks.

InferenceLLMMemory Bandwidth
0 likes · 7 min read
Why More Compute Can't Fix LLM Inference Lag and Why RL Leads to Overtraining
AI Frontier Lectures
AI Frontier Lectures
Jan 12, 2026 · Industry Insights

Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions

The article analyses the challenges of large‑language‑model inference, highlighting memory bandwidth and interconnect as the primary bottlenecks, and presents four research opportunities—high‑bandwidth flash, processing‑near‑memory, 3D memory‑logic stacking, and low‑latency interconnect—while evaluating current Nvidia solutions and proposing integrated architectural approaches.

3D stackingAI hardware researchLLM inference
0 likes · 22 min read
Why LLM Inference Hits a Memory Wall – Four Hardware Research Directions
Architects' Tech Alliance
Architects' Tech Alliance
Mar 31, 2025 · Industry Insights

GPGPU vs ASIC: Who Wins the AI Compute Race?

This article analyzes the trade‑offs between GPGPU and ASIC for AI workloads, covering precision, compute density, power efficiency, memory bandwidth, interconnect technologies like NVLink, and the strategic reasons why leading firms are investing in custom AI chips.

AI chipsASICGPGPU
0 likes · 8 min read
GPGPU vs ASIC: Who Wins the AI Compute Race?
Architects' Tech Alliance
Architects' Tech Alliance
Mar 30, 2025 · Industry Insights

Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips

The article analyzes the rapid growth of AI model memory and compute demands, the slow increase of chip memory capacity, and argues that memory bandwidth and energy consumption, rather than raw compute, will dominate AI chip design, emphasizing multi‑tenancy, DSA flexibility, and data‑flow optimization.

AI chipsDSAMemory Bandwidth
0 likes · 7 min read
Why Memory, Not Compute, Is the Bottleneck for Next‑Gen AI Chips
Architects' Tech Alliance
Architects' Tech Alliance
Mar 13, 2025 · Fundamentals

How Memory Bandwidth and Latency Shape CPU Performance

The article explains how CPU computation latency arises from memory speed, bandwidth, and access delays, detailing the relationships among memory, bandwidth, and latency, and examines key factors such as clock frequency, pipelining, parallelism, cache hit rate, and signal propagation distances that together determine overall system performance.

CPULatencyMemory Bandwidth
0 likes · 9 min read
How Memory Bandwidth and Latency Shape CPU Performance
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 24, 2021 · Cloud Computing

LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021

The article reviews two HPCA 2021 papers from Alibaba Cloud—LIBRA, a dynamic memory‑bandwidth management framework that boosts data‑center utilization, and CARE, a cache‑based fault‑tolerance architecture that delivers near‑Chipkill reliability with minimal overhead—while also highlighting future research directions in ML systems, quantum computing, and cache computing.

HPCA2021Memory Bandwidthcloud computing
0 likes · 4 min read
LIBRA and CARE: Memory Bandwidth Management and Fault‑Tolerance Innovations Presented at HPCA 2021