Mastering JVM Garbage Collection: Algorithms, Collectors, and Tuning
This article explains the theory behind JVM garbage collection algorithms, details various collectors such as Serial, Parallel, CMS, and G1, compares their strengths and weaknesses, and explores advanced concepts like three‑color marking, write barriers, SATB, and memory management parameters.
Garbage Collection Algorithms
The JVM uses several garbage‑collection (GC) algorithms based on generational collection theory. Memory is divided into young and old generations, allowing each generation to use the most suitable algorithm.
Generational Collection Theory
Objects are grouped by lifespan into young and old generations.
Young generation objects are typically short‑lived, so a copying algorithm is used to minimize copy cost.
Old generation objects benefit from mark‑sweep or mark‑compact algorithms.
Copying (Mark‑Copy) Algorithm
The heap is split into two equal regions. Live objects are copied from the active region to the other region, after which the entire active region is reclaimed in one step.
Mark‑Sweep Algorithm
Two phases: mark reachable objects, then sweep away the rest.
Drawbacks: lower efficiency and memory fragmentation.
Mark‑Compact Algorithm
After marking, live objects are moved to one side of the heap, eliminating fragmentation.
More time‑consuming than pure mark‑sweep.
Garbage Collectors
Collectors are concrete implementations of the above algorithms.
Serial Collector
Single‑threaded, stops all application threads (STW) during GC.
Uses copying for young generation and mark‑compact for old generation.
JVM flags:
-XX:+UseSerialGC -XX:+UseSerialOldGCParNew Collector
Multithreaded version of Serial, default for the server VM.
Also uses copying for the young generation.
JVM flag:
-XX:UseParNewGCParallel Scavenge Collector (JDK 1.8 default)
Multithreaded copying collector focused on maximizing throughput.
Thread count defaults to the number of CPU cores.
JVM flags:
-XX:UseParallelGC(young) and
-XX:UseParallelOldGC(old).
Serial Old Collector
Single‑threaded mark‑compact collector for the old generation.
Parallel Old Collector (JDK 1.8 default)
Multithreaded mark‑compact collector for the old generation.
Prioritizes throughput; may cause longer STW pauses.
CMS (Concurrent Mark‑Sweep) Collector
Designed for low pause times, CMS uses a mark‑sweep algorithm and performs most work concurrently.
CMS Collection Steps
Initial Mark : STW pause to mark GC roots.
Concurrent Mark : Traverses the object graph without stopping application threads.
Remark : A short STW pause to catch changes that occurred during concurrent marking.
Concurrent Sweep : Reclaims unmarked regions while the application runs.
CMS Advantages and Drawbacks
Low pause times, suitable for latency‑sensitive services.
High CPU usage and inability to collect floating garbage.
Mark‑sweep can cause memory fragmentation;
-XX:+UseCMSCompactAtFullCollectioncan mitigate this.
Possible "concurrent mode failure" leading to a full STW GC.
CMS Parameters
-XX:+UseConcMarkSweepGC– enable CMS.
-XX:ConcGCThreads– number of concurrent GC threads.
-XX:+UseCMSCompactAtFullCollection– compact after a full GC.
-XX:CMSFullGCsBeforeCompaction– how many full GCs before compaction.
-XX:CMSInitiatingOccupancyFraction– trigger CMS when old generation reaches this percentage (default 92%).
-XX:+UseCMSInitiatingOccupancyOnly– use only the specified occupancy threshold.
-XX:+CMSScavengeBeforeRemark– perform a minor GC before the remark phase.
-XX:+CMSParallelInitialMarkEnabled– multithreaded initial mark.
-XX:+CMSParallelRemarkEnabled– multithreaded remark.
Three‑Color Marking and Write Barriers
Three‑Color Marking Algorithm
Black : Objects fully scanned (root and all reachable fields).
Gray : Object scanned but its fields not yet processed.
White : Unscanned objects; after the algorithm they are considered garbage.
Snapshot‑At‑The‑Beginning (SATB)
SATB records a snapshot of reachable objects at the start of a concurrent GC. New allocations after the start are treated as live, and write barriers record references that change during the marking phase.
Write Barrier
<code>void oop_field_store(oop* field, oop new_value) {
// pre‑write barrier (record old value if needed)
*field = new_value; // actual store
// post‑write barrier (record new value for incremental update)
}</code>The barrier ensures that reference changes are logged so the collector can maintain correctness during concurrent marking.
Read Barrier
<code>oop oop_field_load(oop* field) {
pre_load_barrier(field); // record the read if in concurrent phase
return *field;
}</code>Read barriers are used by collectors such as ZGC to keep track of objects accessed during marking.
Remember Set and Card Table
To avoid scanning the entire old generation for cross‑generation references, the JVM maintains a Remember Set, implemented via a Card Table. Each card (typically 512 bytes) is marked dirty when a reference from an old object to a young object is created, allowing the GC to focus only on dirty cards.
References
"深入理解 Java 虚拟机第三版" – 周志明
Three‑color marking and write barriers – 路过的猪
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.