Tagged articles
49 articles
Page 1 of 1
DeepHub IMBA
DeepHub IMBA
Mar 25, 2026 · Artificial Intelligence

TPU Architecture and Pallas Kernels: From Memory Hierarchy to FlashAttention

This article explains why TPU programming differs from GPU, describes the explicit HBM‑VMEM‑register data movement required on TPU, introduces the Pallas grid‑BlockSpec‑Ref model, and walks through four progressively more complex kernels—including element‑wise add, tiled dot product, fused RMSNorm with scratch memory, and a production‑grade FlashAttention implementation—showing how each kernel maps to the TPU memory hierarchy and leverages Pallas features such as input_output_aliases and PrefetchScalarGridSpec.

FlashAttentionJAXMemory Hierarchy
0 likes · 20 min read
TPU Architecture and Pallas Kernels: From Memory Hierarchy to FlashAttention
Deepin Linux
Deepin Linux
Dec 2, 2025 · Fundamentals

Why CPU Cache Misses Slow Down Your Linux System—and How to Fix Them

CPU caches bridge the speed gap between processors and memory, but cache misses can dramatically degrade performance, especially under high concurrency or big‑data workloads; this article explains cache architecture, common miss causes, diagnostic tools like perf and cachestat, and practical optimization techniques for Linux systems.

CPU cacheLinux performanceMESI Protocol
0 likes · 44 min read
Why CPU Cache Misses Slow Down Your Linux System—and How to Fix Them
Architects' Tech Alliance
Architects' Tech Alliance
Sep 30, 2025 · Artificial Intelligence

How KV Cache and CachedAttention Revolutionize LLM Inference Efficiency

This article explains how key‑value (KV) caching and the new CachedAttention technique dramatically reduce large‑language‑model inference costs by reusing stored attention data across dialogue turns, leveraging a three‑tier memory hierarchy of HBM, DRAM, and SSD to overcome bandwidth and capacity bottlenecks.

AI PerformanceCachedAttentionKV cache
0 likes · 8 min read
How KV Cache and CachedAttention Revolutionize LLM Inference Efficiency
Deepin Linux
Deepin Linux
Jul 21, 2025 · Fundamentals

Unlocking CPU Speed: How Cache Bridges the Gap Between Processor and Memory

This article explains why modern CPUs need cache memory, describes the hierarchy of L1‑L3 caches, the principles of locality, write policies, multi‑core coherence mechanisms such as bus snooping and the MESI protocol, and offers practical code‑level optimizations to improve cache performance.

CPUCacheMESI Protocol
0 likes · 31 min read
Unlocking CPU Speed: How Cache Bridges the Gap Between Processor and Memory
Deepin Linux
Deepin Linux
May 2, 2025 · Fundamentals

Understanding CPU Cache, Memory Hierarchy, and Concurrency Control

This article explains the principles of CPU cache, the multi‑level memory hierarchy, virtual memory, data consistency, and concurrency control mechanisms, illustrating how they together bridge the speed gap between the fast processor and slower memory/storage in modern computer systems.

CPUMemory HierarchyVirtual Memory
0 likes · 21 min read
Understanding CPU Cache, Memory Hierarchy, and Concurrency Control
Liangxu Linux
Liangxu Linux
Mar 22, 2025 · Fundamentals

Unveiling CPU Cache: How It Bridges the Speed Gap Between CPU and Memory

CPU cache, a multi‑level SRAM memory positioned between registers and main memory, evolved from non‑existent in early CPUs to sophisticated L1‑L4 hierarchies, addressing the massive speed disparity between processors and RAM by exploiting spatial and temporal locality to dramatically boost overall system performance.

CPUCacheMemory Hierarchy
0 likes · 10 min read
Unveiling CPU Cache: How It Bridges the Speed Gap Between CPU and Memory
Linux Kernel Journey
Linux Kernel Journey
Feb 5, 2025 · Fundamentals

Boost Code Performance by Leveraging CPU Cache Principles

This article explains how CPU caches bridge the speed gap between the processor and main memory, describes cache hierarchy, locality principles, write policies, coherence protocols, and provides concrete C code examples and practical tips such as data alignment and loop restructuring to improve cache hit rates and overall program speed.

CPU cacheMESI ProtocolMemory Hierarchy
0 likes · 30 min read
Boost Code Performance by Leveraging CPU Cache Principles
Liangxu Linux
Liangxu Linux
Sep 17, 2024 · Fundamentals

Why CPU Memory Access Is Far More Complex Than You Think

The article explains how CPUs read and write memory through a hierarchy of caches, virtual memory translation, and coherence protocols, revealing that the seemingly simple operation actually involves multiple hardware and software layers that programmers must understand to write high‑performance code.

CacheCoherenceMemory Hierarchy
0 likes · 16 min read
Why CPU Memory Access Is Far More Complex Than You Think
Open Source Linux
Open Source Linux
Jul 30, 2024 · Fundamentals

Understanding CPU Cache: History, Principles, and Design Strategies

This article explains the evolution of CPU cache, its underlying principles of temporal and spatial locality, various cache architectures, implementation details, and practical considerations such as cache line size and replacement policies, providing a comprehensive overview for developers and computer engineers.

CPUCacheMemory Hierarchy
0 likes · 20 min read
Understanding CPU Cache: History, Principles, and Design Strategies
Liangxu Linux
Liangxu Linux
Jun 19, 2024 · Fundamentals

Understanding CPU Cache: Types, Structure, and Performance Optimization

This article explains why CPU caches are needed, describes the hierarchy and internal structure of L1, L2, and L3 caches, compares direct‑mapped, set‑associative and fully‑associative designs, and shows how cache‑aware coding (row‑major vs column‑major loops) dramatically improves execution speed.

CPU cacheMemory Hierarchycache architecture
0 likes · 12 min read
Understanding CPU Cache: Types, Structure, and Performance Optimization
Java Tech Enthusiast
Java Tech Enthusiast
May 17, 2024 · Fundamentals

Understanding Computer Time Units, CPU Cycles and Performance Latency

The article explains that software performance is measured in milliseconds to nanoseconds, describes core hardware components—CPU, caches, and DRAM—shows how cache hierarchy speeds differ, defines a clock cycle as the basic time unit, and provides typical latency figures for operations ranging from a single CPU cycle to a full system reboot.

CPU cyclesLatencyMemory Hierarchy
0 likes · 7 min read
Understanding Computer Time Units, CPU Cycles and Performance Latency
Liangxu Linux
Liangxu Linux
Dec 4, 2023 · Fundamentals

Why Row‑Major Access Beats Column‑Major in C: A Cache‑Level Deep Dive

The article explains why iterating a two‑dimensional array by rows runs dramatically faster than by columns, covering memory‑hierarchy basics, locality principles, cache behavior, cache‑line mechanics, and Linux perf measurements that reveal a 20‑fold speed gap caused by cache‑miss rates.

C ProgrammingCacheMemory Hierarchy
0 likes · 11 min read
Why Row‑Major Access Beats Column‑Major in C: A Cache‑Level Deep Dive
Open Source Linux
Open Source Linux
Nov 30, 2023 · Fundamentals

Understanding the Four Levels of Computer Storage and Their Roles

This article explains the four main categories of computer storage—Level 1 (directly attached memory), Level 2 (I/O‑connected drives like HDDs and SSDs), Level 3 (large removable media such as tape), and offline storage (manual devices like optical discs and USB drives)—and clarifies how Direct‑Attached Storage (DAS) and RAID technologies fit into this hierarchy.

DASHardwareMemory Hierarchy
0 likes · 5 min read
Understanding the Four Levels of Computer Storage and Their Roles
Liangxu Linux
Liangxu Linux
Nov 20, 2023 · Fundamentals

How Do Computers Really Work? A Deep Dive into CPU, Memory, and Architecture

This article explains the fundamental principles of computer operation, covering the von Neumann model, CPU and RAM interaction, instruction sets, memory hierarchy, caches, endianness, compilers, operating systems, and performance optimizations, while illustrating each concept with diagrams and code examples.

CPUMemory Hierarchycompilers
0 likes · 28 min read
How Do Computers Really Work? A Deep Dive into CPU, Memory, and Architecture
Open Source Linux
Open Source Linux
Oct 19, 2023 · Fundamentals

How Do Computers Really Work? Inside CPU, Memory, and Architecture

This article explains the fundamental principles of how computers operate, covering the basic architecture of CPUs and memory, the role of buses and registers, instruction sets, compiler translation, cache hierarchies, storage layers, and the impact of 32‑bit versus 64‑bit designs.

CPUInstruction SetMemory Hierarchy
0 likes · 27 min read
How Do Computers Really Work? Inside CPU, Memory, and Architecture
Architects' Tech Alliance
Architects' Tech Alliance
Sep 17, 2023 · Fundamentals

FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages

This article provides a comprehensive overview of FPGA technology, detailing its programmable logic cells, input/output blocks, switch matrices, historical evolution, flexibility versus ASIC and GPU, memory hierarchy including on‑chip and HBM2e, and the benefits of Network‑on‑Chip architectures for performance, power and design modularity.

ASICFPGAGPU
0 likes · 12 min read
FPGA Overview: Architecture, Memory Hierarchy, and NoC Advantages
Open Source Linux
Open Source Linux
Sep 15, 2023 · Fundamentals

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code

This article explains the purpose of CPU caches, describes the three cache levels and their internal structures—including direct‑mapped, set‑associative, and fully‑associative designs—and shows how cache‑aware programming can dramatically improve code performance.

CPU cacheFully AssociativeMemory Hierarchy
0 likes · 11 min read
Why CPU Cache Matters: Understanding Cache Types and Optimizing Code
Open Source Linux
Open Source Linux
Jul 7, 2023 · Fundamentals

Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency

This article explains the purpose of CPU cache memory, the principles of temporal and spatial locality, the multi‑level cache architecture, the MESI cache‑coherence protocol for multi‑core processors, and the optimizations such as store buffers and memory barriers that address performance and consistency challenges.

CPU cacheMESIMemory Hierarchy
0 likes · 16 min read
Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency
AI Cyberspace
AI Cyberspace
Jun 21, 2023 · Fundamentals

Why Virtual Memory Solves the Biggest Problems of Physical Addressing

This article explains how modern computer systems use a hierarchical memory structure and virtual memory to overcome physical memory limitations, address translation challenges, fragmentation, and security issues, detailing concepts such as page tables, TLB caching, multi‑level paging, and practical examples.

Memory HierarchyTLBVirtual Memory
0 likes · 21 min read
Why Virtual Memory Solves the Biggest Problems of Physical Addressing
Top Architect
Top Architect
Nov 2, 2022 · Fundamentals

Fundamentals of Computer Architecture: CPU, Memory Hierarchy, Caches, and Compilers

This article provides a comprehensive overview of how computers operate, covering CPU instruction cycles, memory organization, endianness, compiler translation, operating‑system interaction, cache levels, storage tiers, and the principles of temporal and spatial locality that drive modern performance optimizations.

CPUMemory HierarchyOperating System
0 likes · 27 min read
Fundamentals of Computer Architecture: CPU, Memory Hierarchy, Caches, and Compilers
Tencent Cloud Developer
Tencent Cloud Developer
Nov 1, 2022 · Fundamentals

Understanding CPU Cache, Memory Hierarchy, and Virtual Memory

The article explains how modern computers use fast SRAM caches (L1‑L3) inside the CPU with various mapping schemes and the MESI coherence protocol to keep data consistent, while DRAM serves as main memory, and virtual memory with multi‑level page tables and a TLB abstracts physical memory, provides isolation, and enables swapping.

CPU cacheMESI ProtocolMemory Hierarchy
0 likes · 16 min read
Understanding CPU Cache, Memory Hierarchy, and Virtual Memory
Open Source Linux
Open Source Linux
May 23, 2022 · Fundamentals

How Do Computers Really Work? Inside CPU, Memory, and Compilers Explained

This article explores the core principles of computer operation, covering CPU architecture, memory hierarchy, instruction execution, compiler role, cache levels, and the impact of hardware design on performance, while illustrating concepts with diagrams and practical examples to demystify how modern computers process data.

CPUMemory Hierarchycompilers
0 likes · 27 min read
How Do Computers Really Work? Inside CPU, Memory, and Compilers Explained
Top Architect
Top Architect
Feb 1, 2022 · Fundamentals

Understanding CPU Cache Hierarchy, Cache Coherence, and Performance Optimization

This article explains the structure of modern CPU caches, the principles of cache lines, associativity, and coherence protocols, and demonstrates how these hardware details affect program performance through multiple code examples covering loop stride, matrix traversal, multithreading, and false sharing.

CPU cacheMemory Hierarchycache coherence
0 likes · 21 min read
Understanding CPU Cache Hierarchy, Cache Coherence, and Performance Optimization
21CTO
21CTO
Dec 2, 2021 · Fundamentals

Why Caches Matter: A Deep Dive into CPU Memory Hierarchy and Consistency

This article provides a comprehensive overview of CPU caches, covering why they are needed, their classification, placement and lookup mechanisms, replacement and write policies, and coherence protocols such as MESI, illustrating each concept with diagrams and code examples.

CPUCacheMemory Hierarchy
0 likes · 11 min read
Why Caches Matter: A Deep Dive into CPU Memory Hierarchy and Consistency
Programmer DD
Programmer DD
Nov 15, 2021 · Fundamentals

Why Cache Matters: Understanding Placement, Replacement, and Consistency

This article explores the role of cache in computer architecture, covering why caches are needed, how data is placed and retrieved, various replacement policies, write strategies, and consistency protocols such as MESI, while illustrating concepts with diagrams and code examples.

CacheCache ConsistencyLRU
0 likes · 12 min read
Why Cache Matters: Understanding Placement, Replacement, and Consistency
Open Source Linux
Open Source Linux
Nov 7, 2021 · Fundamentals

How Do Computers Really Work? Inside CPU, Memory, and Compilers

This article explains the fundamental principles of computer operation, covering CPU architecture, memory organization, instruction execution, compiler translation, caching strategies, and the role of operating systems, while illustrating concepts with diagrams and code examples.

CPUMemory HierarchyOperating Systems
0 likes · 25 min read
How Do Computers Really Work? Inside CPU, Memory, and Compilers
Top Architect
Top Architect
Oct 4, 2021 · Fundamentals

Understanding Cache: Concepts, Mechanisms, and Consistency

This article provides a comprehensive overview of cache memory, explaining why caches are needed, their placement strategies, operation principles, replacement policies, write handling methods, and coherence protocols such as MESI, offering essential knowledge for computer architecture and system design.

CacheMESIMemory Hierarchy
0 likes · 12 min read
Understanding Cache: Concepts, Mechanisms, and Consistency
Liangxu Linux
Liangxu Linux
Aug 16, 2021 · Fundamentals

Why CPUs Need Cache: A Deep Dive into Cache Mechanics and Coherence

This article explains the motivation behind CPU caches, their classification, placement and lookup methods, replacement and write policies, and coherence protocols, providing a comprehensive overview of cache fundamentals for modern computer architectures.

LRUMemory Hierarchycache coherence
0 likes · 14 min read
Why CPUs Need Cache: A Deep Dive into Cache Mechanics and Coherence
Liangxu Linux
Liangxu Linux
May 4, 2021 · Fundamentals

Why Computers Use a Memory Hierarchy: Registers, Cache, RAM & Virtual Memory

The article explains the purpose and structure of the memory hierarchy—from ultra‑fast registers and caches inside the CPU, through volatile main memory, to slower non‑volatile disks—showing how programs are loaded, executed, and how virtual memory and locality principles extend usable memory beyond physical limits.

CacheMemory HierarchyOperating Systems
0 likes · 12 min read
Why Computers Use a Memory Hierarchy: Registers, Cache, RAM & Virtual Memory
Python Crawling & Data Mining
Python Crawling & Data Mining
Mar 7, 2021 · Fundamentals

Unlocking System Performance: How Amdahl’s Law and Parallelism Shape Modern Computing

This article explains how computer systems combine hardware and system software, describes the memory hierarchy, OS abstractions, Amdahl's law, and the three levels of parallelism—thread‑level, instruction‑level, and SIMD—showing why understanding these concepts is essential for writing fast, reliable programs.

Amdahl's LawMemory HierarchyParallelism
0 likes · 16 min read
Unlocking System Performance: How Amdahl’s Law and Parallelism Shape Modern Computing
Liangxu Linux
Liangxu Linux
Oct 31, 2020 · Fundamentals

How CPU Cache Works and How to Write Faster Code

Understanding CPU cache hierarchy, its speed advantages over memory, and the mechanics of cache lines, tags, and offsets reveals why code that maximizes cache hit rates—through sequential data access, branch prediction, and core affinity—can run dramatically faster on modern processors.

CPU cacheCache Hit RateMemory Hierarchy
0 likes · 18 min read
How CPU Cache Works and How to Write Faster Code
dbaplus Community
dbaplus Community
Mar 30, 2020 · Fundamentals

Why Cache Memory Matters: From Code Layout to Multi‑Level Caches

This article explains why cache memory is essential for modern CPUs, how different loop orders affect cache hits, the structure of direct‑mapped, set‑associative and fully‑associative caches, multi‑level cache hierarchies, and the policies that govern cache allocation and updates.

CPU performanceCacheMemory Hierarchy
0 likes · 21 min read
Why Cache Memory Matters: From Code Layout to Multi‑Level Caches
Liangxu Linux
Liangxu Linux
Feb 23, 2020 · Fundamentals

Understanding the Core Components of Modern Operating Systems

This article provides a comprehensive overview of operating system fundamentals, covering hardware components, kernel and user modes, CPU architecture, memory hierarchy, caching strategies, I/O devices, bus systems, boot processes, and various types of operating systems from mainframes to embedded devices.

Boot ProcessCPU designI/O devices
0 likes · 34 min read
Understanding the Core Components of Modern Operating Systems
Java Captain
Java Captain
Jul 9, 2018 · Fundamentals

Fundamental Computer Concepts and Java JVM Memory Architecture

This article explains basic computer concepts such as storage units, registers, memory hierarchy, kernel and user space, CPU word length, and then details the Java Virtual Machine's runtime data areas, object creation process, and object reference mechanisms.

CPU architectureJVMMemory Hierarchy
0 likes · 14 min read
Fundamental Computer Concepts and Java JVM Memory Architecture
DevOps
DevOps
Apr 17, 2016 · Fundamentals

CPU “Ah Gan” Explains the Boot Process, Memory Hierarchy, Cache, and Pipelining

Through a whimsical first‑person narrative, the article walks readers through a CPU’s start‑up sequence, BIOS interrupt handling, loading the boot sector, memory access patterns, the principle of locality, cache usage, and the introduction of pipelining to illustrate fundamental computer architecture concepts.

Boot ProcessCPUCache
0 likes · 11 min read
CPU “Ah Gan” Explains the Boot Process, Memory Hierarchy, Cache, and Pipelining
21CTO
21CTO
Aug 26, 2015 · Fundamentals

Why Are CPU Registers Faster Than Memory? Three Key Reasons Explained

Registers outrun main memory because they sit closer to the CPU, employ high‑performance hardware designs, and involve far fewer access steps, a distinction illustrated with examples from iPhone 5s architecture and detailed step‑by‑step memory access processes.

CPU architectureMemory HierarchyRegisters
0 likes · 6 min read
Why Are CPU Registers Faster Than Memory? Three Key Reasons Explained
Qunar Tech Salon
Qunar Tech Salon
Mar 23, 2015 · Fundamentals

Understanding CPU Cache: Purpose, Multi‑Level Design, Cache Lines, and Optimization Techniques

This article explains why CPU caches are needed, the evolution to multi‑level caches, the concept of cache lines, practical experiments demonstrating their impact, and how different cache organization strategies such as fully associative, direct‑mapped, and N‑way set‑associative affect performance and eviction policies.

Memory Hierarchycache architecturecache line
0 likes · 14 min read
Understanding CPU Cache: Purpose, Multi‑Level Design, Cache Lines, and Optimization Techniques