Tagged articles

36 articles

Page 1 of 1

May 18, 2026 · Artificial Intelligence

CCD‑Aware Thread Orchestration Shatters Multi‑Core CPU Vector Search Performance Ceiling

The paper presents a CCD‑level load‑aware thread orchestration framework that boosts vector ANNS throughput up to 3.7×, cuts P999 tail latency by 30%‑90%, reduces L3 cache miss rates by 6%‑30% and CPU stall time by 20%‑80% on AMD EPYC multi‑chiplet CPUs.

ANNSCCDCPU cache

0 likes · 19 min read

CCD‑Aware Thread Orchestration Shatters Multi‑Core CPU Vector Search Performance Ceiling

Deepin Linux

Dec 2, 2025 · Fundamentals

Why CPU Cache Misses Slow Down Your Linux System—and How to Fix Them

CPU caches bridge the speed gap between processors and memory, but cache misses can dramatically degrade performance, especially under high concurrency or big‑data workloads; this article explains cache architecture, common miss causes, diagnostic tools like perf and cachestat, and practical optimization techniques for Linux systems.

CPU cacheLinux performanceMESI Protocol

0 likes · 44 min read

Why CPU Cache Misses Slow Down Your Linux System—and How to Fix Them

Linux Kernel Journey

Feb 5, 2025 · Fundamentals

Boost Code Performance by Leveraging CPU Cache Principles

This article explains how CPU caches bridge the speed gap between the processor and main memory, describes cache hierarchy, locality principles, write policies, coherence protocols, and provides concrete C code examples and practical tips such as data alignment and loop restructuring to improve cache hit rates and overall program speed.

CPU cacheMESI ProtocolMemory Hierarchy

0 likes · 30 min read

Boost Code Performance by Leveraging CPU Cache Principles

Deepin Linux

Feb 4, 2025 · Fundamentals

Understanding CPU Cache: Architecture, Hierarchy, and Optimization Techniques

This article explains the fundamental role of CPU cache in bridging the speed gap between processors and memory, covering cache hierarchy, locality principles, write policies, coherence protocols, and practical code optimizations such as data alignment and loop restructuring to improve performance.

CPU cacheData AlignmentMESI Protocol

0 likes · 31 min read

Understanding CPU Cache: Architecture, Hierarchy, and Optimization Techniques

Linux Kernel Journey

Oct 18, 2024 · Fundamentals

Understanding Linux CPU Caches: From Physical Cores to Cache Coherence

This article explains Linux CPU cache architecture—from physical and logical cores, through L1/L2/L3 hierarchy and cache‑line basics, to write‑through/write‑back policies and coherence mechanisms—while demonstrating practical analysis with Valgrind and perf tools.

CPU cacheLinuxcache coherence

0 likes · 22 min read

Understanding Linux CPU Caches: From Physical Cores to Cache Coherence

Liangxu Linux

Jun 19, 2024 · Fundamentals

Understanding CPU Cache: Types, Structure, and Performance Optimization

This article explains why CPU caches are needed, describes the hierarchy and internal structure of L1, L2, and L3 caches, compares direct‑mapped, set‑associative and fully‑associative designs, and shows how cache‑aware coding (row‑major vs column‑major loops) dramatically improves execution speed.

CPU cacheMemory Hierarchycache architecture

0 likes · 12 min read

Understanding CPU Cache: Types, Structure, and Performance Optimization

Ops Development & AI Practice

May 16, 2024 · Fundamentals

Boost Go Performance: Harness CPU Cache Locality with Practical Tips

This article explains the CPU cache locality principle, shows how to restructure Go data access patterns—including data structures, field ordering, memory allocation, and false sharing avoidance—and demonstrates measurable performance gains with a matrix‑multiplication benchmark.

CPU cacheGoMatrix Multiplication

0 likes · 10 min read

Boost Go Performance: Harness CPU Cache Locality with Practical Tips

Tencent Cloud Developer

Apr 25, 2024 · Fundamentals

Cache, Prefetching, False Sharing, Pipeline and Data Dependency: Performance Optimization in Rust

The article uses Rust benchmarks to show how cache layout, prefetching, associativity, false sharing, pipeline stalls, and loop data dependencies impact performance, and demonstrates practical optimizations such as row‑major traversal, proper alignment, avoiding dependent loops, and leveraging sequential access to achieve near‑optimal speed.

CPU cacheRustdata dependency

0 likes · 17 min read

Cache, Prefetching, False Sharing, Pipeline and Data Dependency: Performance Optimization in Rust

Tencent Technical Engineering

Apr 2, 2024 · Fundamentals

Cache, Prefetching, False Sharing, Pipeline, and Data Dependency: Benchmarks and Optimizations in Rust

The article shows how row‑major vs column‑major traversal, random access, cache‑set conflicts, false sharing, branch‑prediction failures, and loop‑carried data dependencies each degrade performance by tying Rust (and C++) code patterns to CPU cache behavior, prefetching, pipeline stalls, and vectorization limits, demonstrated through runnable benchmarks.

CPU cachePipelineRust

0 likes · 15 min read

Cache, Prefetching, False Sharing, Pipeline, and Data Dependency: Benchmarks and Optimizations in Rust

Liangxu Linux

Dec 14, 2023 · Fundamentals

Understanding CPU Cache: Types, Structure, and Performance Optimization

This article explains why CPU caches are needed, describes the hierarchy and internal structure of cache lines, compares direct‑mapped, set‑associative and fully‑associative caches, and shows how cache‑aware coding can dramatically improve program execution speed.

C++CPU cacheMemory Access

0 likes · 12 min read

Open Source Linux

Sep 15, 2023 · Fundamentals

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code

This article explains the purpose of CPU caches, describes the three cache levels and their internal structures—including direct‑mapped, set‑associative, and fully‑associative designs—and shows how cache‑aware programming can dramatically improve code performance.

CPU cacheFully AssociativeMemory Hierarchy

0 likes · 11 min read

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code

Liangxu Linux

Sep 10, 2023 · Fundamentals

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code Performance

This article explains the purpose of CPU caches, describes their hierarchical structure and internal organization—including direct‑mapped, set‑associative, and fully‑associative designs—and shows how cache‑aware programming can dramatically speed up matrix operations.

CPU cacheMemory Accesscache hierarchy

0 likes · 12 min read

Why CPU Cache Matters: Understanding Cache Types and Optimizing Code Performance

Open Source Linux

Aug 17, 2023 · Fundamentals

Why CPU Cache Matters: Unlock Faster Code Execution

This article explains the purpose of CPU caches, their hierarchical structure and internal designs—including direct‑mapped, set‑associative, and fully‑associative caches—and demonstrates how understanding cache behavior can dramatically improve program performance, illustrated with C++ traversal benchmarks.

CPU cachecache hierarchycache optimization

0 likes · 12 min read

Why CPU Cache Matters: Unlock Faster Code Execution

Open Source Linux

Jul 7, 2023 · Fundamentals

Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency

This article explains the purpose of CPU cache memory, the principles of temporal and spatial locality, the multi‑level cache architecture, the MESI cache‑coherence protocol for multi‑core processors, and the optimizations such as store buffers and memory barriers that address performance and consistency challenges.

CPU cacheMESIMemory Hierarchy

0 likes · 16 min read

Why CPUs Need Cache Memory and How the MESI Protocol Ensures Consistency

New Oriental Technology

May 25, 2023 · Fundamentals

Deep Dive into Java volatile: CPU Cache Architecture, MESI Protocol, JMM and Happens‑Before

This article thoroughly explains the low‑level implementation of Java's volatile keyword by analysing CPU multi‑level cache design, the MESI cache‑coherency protocol, the Java Memory Model, memory barriers, the happens‑before principle, and the impact on singleton patterns and synchronized blocks.

CPU cacheHappens-beforeJMM

0 likes · 36 min read

Deep Dive into Java volatile: CPU Cache Architecture, MESI Protocol, JMM and Happens‑Before

Alibaba Cloud Developer

Mar 7, 2023 · Backend Development

Why Loop Order Matters: Boost Java Matrix Multiplication Speed by 100×

This article demonstrates how reorganizing Java matrix‑multiplication loops and understanding Java 2‑D array storage and CPU cache hierarchies can turn a naïve implementation into a version that runs up to a hundred times faster, backed by JMH benchmark results.

CPU cacheMatrix Multiplicationbenchmark

0 likes · 13 min read

Why Loop Order Matters: Boost Java Matrix Multiplication Speed by 100×

Tencent Cloud Developer

Nov 1, 2022 · Fundamentals

Understanding CPU Cache, Memory Hierarchy, and Virtual Memory

The article explains how modern computers use fast SRAM caches (L1‑L3) inside the CPU with various mapping schemes and the MESI coherence protocol to keep data consistent, while DRAM serves as main memory, and virtual memory with multi‑level page tables and a TLB abstracts physical memory, provides isolation, and enables swapping.

CPU cacheMESI ProtocolMemory Hierarchy

0 likes · 16 min read

Understanding CPU Cache, Memory Hierarchy, and Virtual Memory

Selected Java Interview Questions

Jul 27, 2022 · Fundamentals

Understanding Java's volatile Keyword: CPU Cache, Memory Visibility, and the MESI Protocol

This article explains how the volatile keyword ensures visibility in Java multithreaded programs by examining CPU cache architecture, cache‑coherency mechanisms such as the MESI protocol, and the low‑level assembly effects of volatile writes on modern x86 processors.

CPU cacheMESIconcurrency

0 likes · 12 min read

Understanding Java's volatile Keyword: CPU Cache, Memory Visibility, and the MESI Protocol

Su San Talks Tech

Jul 4, 2022 · Fundamentals

Understanding CPU Cache False Sharing and How to Eliminate It

This article explains the concept of CPU cache false sharing, how it degrades performance on multi‑core systems, and provides practical techniques—including cache‑line alignment macros and padding strategies—to prevent it and improve multithreaded application efficiency.

CPU cachecache linefalse sharing

0 likes · 10 min read

Understanding CPU Cache False Sharing and How to Eliminate It

Top Architect

Feb 1, 2022 · Fundamentals

Understanding CPU Cache Hierarchy, Cache Coherence, and Performance Optimization

This article explains the structure of modern CPU caches, the principles of cache lines, associativity, and coherence protocols, and demonstrates how these hardware details affect program performance through multiple code examples covering loop stride, matrix traversal, multithreading, and false sharing.

CPU cacheMemory Hierarchycache coherence

0 likes · 21 min read

Understanding CPU Cache Hierarchy, Cache Coherence, and Performance Optimization

Open Source Linux

Dec 15, 2021 · Fundamentals

Unlocking CPU Speed: How Cache Hierarchy and False Sharing Impact Performance

This article explains CPU cache levels, cache line concepts, coherence protocols, and how cache‑friendly or cache‑unfriendly code patterns—including false sharing and stride access—affect program performance on modern multicore processors.

CPU cachecache hierarchyfalse sharing

0 likes · 21 min read

Unlocking CPU Speed: How Cache Hierarchy and False Sharing Impact Performance

Senior Brother's Insights

Oct 7, 2021 · Fundamentals

Why Does Java’s volatile Keyword Work? Deep Dive into CPU Caches and Memory Barriers

This article explains the hardware origins of CPU caches, bus locks, cache‑coherence protocols such as MESI, store buffers and memory barriers, then shows how the Java Memory Model abstracts these mechanisms and how the volatile keyword guarantees visibility and ordering while not providing atomicity.

CPU cacheMESIMemory Model

0 likes · 28 min read

Why Does Java’s volatile Keyword Work? Deep Dive into CPU Caches and Memory Barriers

Ops Development Stories

Jul 15, 2021 · Operations

Mastering NUMA and Hyper-Threading: Boost CPU Cache Hits and Reduce Latency

This article explains NUMA architecture with hyper‑threading, details CPU cache hierarchies and access latencies, and provides Linux tools and practical optimization techniques to improve cache‑hit rates and minimize cross‑NUMA memory delays.

CPU cacheHyper-threadingLinux

0 likes · 9 min read

Mastering NUMA and Hyper-Threading: Boost CPU Cache Hits and Reduce Latency

ITPUB

Jun 27, 2021 · Fundamentals

How Multi‑Level Caching Boosts Performance and Avoids Common Pitfalls

This article explores the role of multi‑level caching—from distributed and local caches to direct memory and CPU cache—detailing performance gains, cache‑miss handling, consistency challenges, false sharing issues, and practical mitigation techniques such as approximate LRU, random TTL, delayed double‑delete, padding, and lock‑free designs.

CPU cacheCache ConsistencyMemory Management

0 likes · 13 min read

How Multi‑Level Caching Boosts Performance and Avoids Common Pitfalls

Alibaba Cloud Developer

Jun 15, 2021 · Backend Development

Mastering Cache Layers: From Distributed to CPU Cache for Performance Gains

This article explores how introducing various cache layers—from database and distributed caches to local and CPU caches—bridges the gap between fast CPUs and slow I/O, detailing performance benefits, cache miss handling, consistency strategies, memory management, and techniques to avoid false sharing.

CPU cacheMemory Managementcaching

0 likes · 13 min read

Mastering Cache Layers: From Distributed to CPU Cache for Performance Gains

Alibaba Cloud Developer

Jun 9, 2021 · Backend Development

Mastering Cache Optimization: From Distributed to CPU Cache and Beyond

This article explores the fundamentals and advanced techniques of cache optimization, covering multi‑level caching, read/write performance gains, cache miss handling, consistency strategies, heap versus direct memory, CPU cache effects, false sharing, and practical mitigation patterns.

CPU cacheMemory Managementcaching

0 likes · 13 min read

Mastering Cache Optimization: From Distributed to CPU Cache and Beyond

vivo Internet Technology

Jan 6, 2021 · Fundamentals

Deep Dive into Java Volatile Keyword: CPU Cache, MESI Protocol, and JMM

The article thoroughly explains how CPU caches and the MESI coherence protocol interact with Java’s Memory Model, detailing the volatile keyword’s role in ensuring visibility and preventing instruction reordering, and illustrates these concepts with examples such as visibility problems and double‑checked locking.

CPU cacheInstruction ReorderingJMM

0 likes · 22 min read

Deep Dive into Java Volatile Keyword: CPU Cache, MESI Protocol, and JMM

Liangxu Linux

Nov 4, 2020 · Fundamentals

How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?

This article explains the storage hierarchy from CPU registers and caches to RAM, SSD, and HDD, quantifies their speed differences (L1 cache vs. memory, SSD, HDD) and cost ratios, and provides Linux commands to inspect cache sizes, helping readers understand why each level exists and how they interact.

CPU cacheHDDSSD

0 likes · 14 min read

How Much Faster Is CPU L1 Cache Compared to RAM, SSD, and HDD?

Liangxu Linux

Oct 31, 2020 · Fundamentals

How CPU Cache Works and How to Write Faster Code

Understanding CPU cache hierarchy, its speed advantages over memory, and the mechanics of cache lines, tags, and offsets reveals why code that maximizes cache hit rates—through sequential data access, branch prediction, and core affinity—can run dramatically faster on modern processors.

CPU cacheCache Hit RateMemory Hierarchy

0 likes · 18 min read

How CPU Cache Works and How to Write Faster Code

Xiaokun's Architecture Exploration Notes

Jan 30, 2020 · Fundamentals

How False Sharing Slows Java Programs and How to Eliminate It

This article explains the concept of false sharing in CPU caches, demonstrates its performance impact with Java code, analyzes the results, and shows how to prevent it using the @Contended annotation and appropriate JVM flags.

CPU cacheContended annotationJava performance

0 likes · 9 min read

How False Sharing Slows Java Programs and How to Eliminate It

dbaplus Community

May 8, 2019 · Databases

How False Sharing in CPU Cache Slowed Down Our Database Table Scans—and How We Fixed It

During a client’s upgrade test, a database’s compressed tables exhibited severe slowdown under concurrent full‑table scans, which we traced to CPU cache line false sharing in the decompression code; using Linux perf tools we identified the hotspot, aligned memory, and restored performance.

CPU cacheCode OptimizationDatabase Performance

0 likes · 13 min read

How False Sharing in CPU Cache Slowed Down Our Database Table Scans—and How We Fixed It

Beike Product & Technology

Nov 2, 2018 · Fundamentals

Physical Memory Model, Concurrency Concepts, and Java Memory Model Explained

This article explains the modern computer physical memory hierarchy, introduces three key concurrency concepts—atomicity, ordering, and visibility—illustrates them with Java code examples, and then details the Java Memory Model and its eight happen‑before principles that govern multithreaded behavior.

CPU cacheHappen-BeforeMemory Model

0 likes · 14 min read

Physical Memory Model, Concurrency Concepts, and Java Memory Model Explained

Qunar Tech Salon

Oct 31, 2018 · Backend Development

Understanding Cache: Concepts, Types, and Performance Optimization in High-Concurrency Scenarios

This article explains cache fundamentals—from CPU and local caches to distributed systems—covers design principles, performance‑affecting factors, eviction algorithms, and common high‑concurrency issues such as penetration, stampede, and avalanche, and provides practical solutions for selecting and optimizing cache strategies.

CPU cacheCache Evictioncaching

0 likes · 16 min read

Understanding Cache: Concepts, Types, and Performance Optimization in High-Concurrency Scenarios

Architecture Digest

May 5, 2017 · Fundamentals

Understanding False Sharing: CPU Cache, MESI Protocol, and Java Mitigation Techniques

This article explains the concept of false sharing, its roots in CPU cache line contention and the MESI protocol, demonstrates its impact with Java benchmark code, and presents practical padding and alignment techniques to mitigate the performance degradation caused by false sharing.

CPU cacheJava concurrencyMESI Protocol

0 likes · 15 min read

Understanding False Sharing: CPU Cache, MESI Protocol, and Java Mitigation Techniques

Art of Distributed System Architecture Design

Apr 5, 2015 · Fundamentals

Understanding CPU Caches, Coherency, and Memory Models: A Quick Guide

This article provides a concise introduction to CPU cache hierarchies, read/write policies, cache coherency protocols such as snooping and MESI, and the impact of different memory models on multi‑core systems, helping developers grasp essential hardware concepts for reliable software design.

CPU cacheCache CoherencyMESI Protocol

0 likes · 19 min read

Understanding CPU Caches, Coherency, and Memory Models: A Quick Guide

Qunar Tech Salon

Mar 31, 2015 · Fundamentals

Understanding CPU Caches, Coherency Protocols, and Memory Models

This article provides a concise introduction to CPU cache architecture, explains read/write policies, describes cache coherency protocols such as MESI and its variants, and discusses how different memory models affect multi‑core consistency and performance.

CPU cacheCache CoherencyMESI Protocol

0 likes · 19 min read

Understanding CPU Caches, Coherency Protocols, and Memory Models