Fundamentals 8 min read

Understanding Concurrency: From Hardware Mechanisms to Language-Level Barriers

This article explains concurrency fundamentals by examining hardware-level cache coherence, memory barriers, and lock mechanisms, then shows how Go's sync.Mutex and Java's volatile keyword implement these concepts to ensure atomicity, visibility, and ordering across multiple cores.

Liulishuo Tech Team

Mar 5, 2019

Understanding Concurrency: From Hardware Mechanisms to Language-Level Barriers

The article starts by noting that many developers only know language-provided concurrency features and lack understanding of the hardware layer between the virtual machine and the computer system.

1. Problem and Answers – Concurrency issues revolve around atomicity, ordering, and visibility. Using an SMP architecture diagram, the article poses two questions: how multi‑level cache synchronizes across cores, and how reordering for optimization can cause unexpected behavior.

It then lists three types of reordering: compiler optimization (true), instruction‑set execution (true), and cache‑synchronization (false), and provides solutions: bus lock/cache lock mechanisms and optimization/memory barriers.

2. Cache Lock – Describes how, during a LOCK# operation, modified shared data is kept atomic via internal cache‑coherency rather than external bus signals.

3. Cache Coherence – Introduces the MESI protocol with its four states (Modified, Exclusive, Shared, Invalid) and explains the performance costs of state transitions, leading to the addition of store buffers and invalidate queues.

Examples illustrate how store buffers can cause visibility problems: core C1 writes to a buffer, C2 cannot see the update, leading to inconsistent reads.

4. Barriers – Defines optimization barriers (prevent compiler reordering) and memory barriers (store, load, full) that ensure instruction ordering and data visibility.

It connects these concepts to familiar concurrency primitives such as Go's sync.Mutex, Java's synchronized, volatile, and CAS operations.

5. Go – Shows Go’s implementation of sync.Mutex using the CPU CAS instruction (e.g., atomic·Cas64) in asm_amd64.s, which employs the LOCK instruction, acting as a full memory barrier and preventing compiler reordering.

6. Java – Explains how the JVM wraps CPU memory‑barrier instructions into four categories (LoadLoad, StoreStore, LoadStore, StoreLoad). Using the volatile keyword as an example, it details the barriers added before and after volatile reads and writes, guaranteeing visibility and ordering.

The article concludes that it has provided a top‑to‑bottom overview of concurrency, from hardware mechanisms to language‑level implementations, and invites discussion on the presented topics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Concurrency hardware cache coherence memory barrier

Written by

Liulishuo Tech Team

Help everyone become a global citizen!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.