How the Three‑Color Marking Algorithm Powers Java’s CMS and G1 Garbage Collectors
This article explains the three‑color marking algorithm used by Java’s CMS and G1 garbage collectors, detailing the color semantics, algorithmic steps, common pitfalls such as floating garbage and miss‑marking, and the specific solutions each collector employs to minimize stop‑the‑world pauses and improve memory reclamation efficiency.
Three‑color marking is a garbage‑collection technique that enables the JVM to avoid or greatly shorten Stop‑The‑World (STW) pauses, and it is the core algorithm behind the CMS and G1 collectors.
Three‑Color Marking Algorithm Idea
The algorithm classifies objects into three colors:
White : objects that have not been marked (garbage).
Gray : objects that have been marked but whose referenced fields have not yet been fully processed (GC must continue scanning from these objects).
Black : objects that have been fully marked, including all reachable fields (live objects).
Algorithm Process
Starting from the main method’s root object (GC Root), the collector traverses objects using the black‑gray‑white rules, marking all objects reachable from the GC Root. After the first scan, a short STW pause may occur, followed by a second scan that only needs to process gray objects, because most marking work was already done concurrently. The GC thread then scans the heap to collect remaining white objects.
Specific Steps
Create three sets: white, gray, black.
Place all objects into the white set.
Traverse from the root (non‑recursive) and move visited objects from white to gray.
Iterate over the gray set, moving referenced white objects to gray, then move the processed gray object to black.
Repeat step 4 until the gray set is empty.
Use a write‑barrier to detect object changes and repeat the above steps.
Collect all remaining white objects as garbage.
Problems of Three‑Color Marking
Floating garbage: during concurrent marking, an object that becomes garbage after being colored black or gray may not be rescanned, leaving it uncollected until the next GC cycle.
Object miss‑marking: a white object can be detached from references while a black object still points to it, causing the collector to mistakenly reclaim a live object. Both CMS and G1 implement specific mechanisms (Incremental Update for CMS, SATB for G1) to mitigate these issues.
Solutions
Both CMS (Concurrent Mark Sweep) and G1 (Garbage First) use the three‑color algorithm and have their own strategies to address miss‑marking.
CMS Review
CMS aims for minimal pause times and consists of four phases: Initial Mark (STW), Concurrent Mark, Remark (STW), and Concurrent Sweep. The initial and remark phases quickly mark objects directly reachable from GC Roots, while the concurrent phases perform most of the work without stopping application threads.
To handle miss‑marking, CMS employs Incremental Update: when a previously unmarked (white) object is re‑referenced, the referencing black object becomes gray, ensuring the object is processed in the next remark phase.
CMS Solutions: Incremental Update
When a white object is re‑referenced, any black object that now points to it turns gray, prompting the GC thread to revisit its fields during the next marking cycle.
G1 Review
G1 divides the heap into equal‑sized Region s rather than fixed generations. Regions can serve as Eden, Survivor, or Old spaces, and large objects are placed in special Humongous regions. The collector selects regions for collection based on the amount of reclaimable garbage, forming a CSet (Collection Set) for each cycle.
Key supporting structures include:
Card Table : a bitmap that tracks which old‑generation cards contain references to the young generation, enabling efficient young‑generation GC.
RSet (Remembered Set) : per‑region records of references from other regions, allowing the collector to scan only relevant regions instead of the entire heap.
CSet (Collection Set) : the set of regions chosen for reclamation in a given GC cycle.
G1 Solution: SATB
G1 uses the SATB (Snapshot At The Beginning) technique to handle miss‑marking. At the start of marking, a snapshot of reachable objects is taken. When a reference is cleared, the write barrier pushes the change onto a GC stack so that the object can still be seen as reachable. Combined with the RSet, the collector determines whether an object is still referenced before reclaiming it.
SATB Detailed Process
Take a snapshot of live objects at the beginning of marking.
When a reference is removed, the write barrier records the change so the object remains visible to the GC.
Use the RSet to scan which regions reference the white objects; unreferenced white objects are reclaimed.
Why SATB Is More Efficient Than Incremental Update
SATB only rescans references that were pushed onto the GC stack and leverages the RSet to quickly determine reachability, avoiding full heap scans. G1 also selects regions for collection based on the amount of garbage versus the estimated STW cost, moving live objects to new regions and freeing the old ones.
Does G1 Perform Full GC?
Yes, when the heap is exhausted G1 triggers a Full GC, which in JDK 10 and earlier is single‑threaded. To avoid costly Full GCs, one can increase heap size, improve CPU performance, or lower the threshold for Mixed GC triggers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
