Fundamentals 15 min read

Understanding G1 Garbage Collector: Key Concepts, Regions, SATB, RSet, and Pause Prediction Model

The G1 Garbage‑First collector is a server‑side, region‑based Java GC designed for multi‑processor, large‑memory systems that concurrently marks objects using SATB, tracks cross‑region references with RSets, and employs a pause‑prediction model to meet user‑specified pause‑time goals while maintaining high throughput.

Meituan Technology Team

Sep 23, 2016

Understanding G1 Garbage Collector: Key Concepts, Regions, SATB, RSet, and Pause Prediction Model

G1 GC (Garbage-First Garbage Collector) can be enabled with the -XX:+UseG1GC flag. It was introduced as an experimental feature in JDK 6u14, became officially available in JDK 7u4, and was proposed as the default collector in JDK 9 (JEP 248).

The Garbage-First (G1) collector is a server‑style garbage collector, targeted for multi‑processor machines with large memories. It meets garbage collection (GC) pause time goals with a high probability, while achieving high throughput. The G1 garbage collector is fully supported in Oracle JDK 7 update 4 and later releases. The G1 collector is designed for applications that: Can operate concurrently with application threads like the CMS collector. Compact free space without lengthy GC‑induced pause times. Need more predictable GC pause durations. Do not want to sacrifice a lot of throughput performance. Do not require a much larger Java heap.

From the official description, G1 is a server‑side collector aimed at multi‑processor and large‑memory environments, providing high throughput while keeping pause times low. Its main characteristics include concurrent execution with application threads, faster free‑space compaction, pause‑time predictability, and modest heap size requirements.

G1 was designed to replace CMS. Compared with CMS, G1 offers:

Region‑based memory layout that reduces fragmentation.

More controllable Stop‑The‑World (STW) pauses with a pause‑time prediction mechanism that lets users specify a target pause.

Important Concepts in G1

Region

Traditional collectors divide the heap into contiguous generations (young, old, perm). G1 divides the heap into a set of equal‑sized, non‑contiguous Regions . Each Region occupies a contiguous virtual address range, but the overall set of Regions is scattered throughout the heap.

Some Regions are marked Humongous (H) because they store objects whose size is at least half of a Region.

The size of a Region can be configured with -XX:G1HeapRegionSize (1 MiB – 32 MiB, power of two). If not set, G1 chooses a size based on the total heap.

// share/vm/gc_implementation/g1/heapRegion.cpp
#define MIN_REGION_SIZE  (1024 * 1024)
#define MAX_REGION_SIZE  (32 * 1024 * 1024)
#define TARGET_REGION_NUMBER 2048
void HeapRegion::setup_heap_region_size(size_t initial_heap_size, size_t max_heap_size) {
  uintx region_size = G1HeapRegionSize;
  if (FLAG_IS_DEFAULT(G1HeapRegionSize)) {
    size_t average_heap_size = (initial_heap_size + max_heap_size) / 2;
    region_size = MAX2(average_heap_size / TARGET_REGION_NUMBER,
                     (uintx) MIN_REGION_SIZE);
  }
  int region_size_log = log2_long((jlong) region_size);
  region_size = ((uintx)1 << region_size_log);
  if (region_size < MIN_REGION_SIZE) region_size = MIN_REGION_SIZE;
  else if (region_size > MAX_REGION_SIZE) region_size = MAX_REGION_SIZE;
}

SATB (Snapshot‑At‑The‑Beginning)

SATB records a snapshot of live objects at the start of a GC cycle. It is used to maintain the correctness of concurrent marking.

Objects can be white (unmarked), gray (marked but fields not fully processed), or black (fully processed). Concurrent mutator activity can cause white objects to be missed unless SATB barriers record reference updates.

// share/vm/gc_implementation/g1/g1SATBCardTableModRefBS.hpp
template <class T> static void write_ref_field_pre_static(T* field, oop newVal) {
  T heap_oop = oopDesc::load_heap_oop(field);
  if (!oopDesc::is_null(heap_oop)) {
    enqueue(oopDesc::decode_heap_oop(heap_oop));
  }
}
// share/vm/gc_implementation/g1/g1SATBCardTableModRefBS.cpp
void G1SATBCardTableModRefBS::enqueue(oop pre_val) {
  assert(pre_val->is_oop(true), "Error");
  if (!JavaThread::satb_mark_queue_set().is_active()) return;
  Thread* thr = Thread::current();
  if (thr->is_Java_thread()) {
    JavaThread* jt = (JavaThread*)thr;
    jt->satb_mark_queue().enqueue(pre_val);
  } else {
    MutexLockerEx x(Shared_SATB_Q_lock, Mutex::_no_safepoint_check_flag);
    JavaThread::satb_mark_queue_set().shared_satb_queue()->enqueue(pre_val);
  }
}

Because SATB may retain objects that are actually garbage, it can introduce "float garbage".

RSet (Remembered Set)

RSet records cross‑Region references (points‑into). Each Region has an RSet that maps source Regions to the Cards that reference objects inside the target Region. This structure, together with the Card Table (points‑out), enables G1 to limit scanning to relevant Regions during young‑generation and mixed GCs.

void oop_field_store(oop* field, oop new_value) {
  pre_write_barrier(field);          // maintain SATB invariant
  *field = new_value;                // actual store
  post_write_barrier(field, new_value); // track cross‑Region reference
}

During a Young GC, only the RSets of young Regions are scanned for old‑to‑young references, avoiding a full old‑generation scan. In mixed GC, old‑to‑old references are obtained from old‑generation RSets, further reducing work.

Pause Prediction Model

G1 uses a pause‑prediction model to meet a user‑defined pause target (‑XX:MaxGCPauseMillis, default 200 ms). The model predicts how many Regions need to be collected to stay within the target.

// share/vm/gc_implementation/g1/g1CollectorPolicy.hpp
double get_new_prediction(TruncatedSeq* seq) {
  return MAX2(seq->davg() + sigma() * seq->dsd(),
               seq->davg() * confidence_factor(seq->num()));
}

The model relies on a decaying average (davg) and decaying standard deviation (dsd) stored in a TruncatedSeq that keeps the most recent n samples.

// src/share/vm/utilities/numberSeq.cpp
void AbsSeq::add(double val) {
  if (_num == 0) {
    _davg = val;
    _dvariance = 0.0;
  } else {
    _davg = (1.0 - _alpha) * val + _alpha * _davg;
    double diff = val - _davg;
    _dvariance = (1.0 - _alpha) * diff * diff + _alpha * _dvariance;
  }
}

Prediction of RSet update time (cost per dirty Card) is performed as:

// share/vm/gc_implementation/g1/g1CollectorPolicy.hpp
double predict_rs_update_time_ms(size_t pending_cards) {
  return (double) pending_cards * predict_cost_per_card_ms();
}
double predict_cost_per_card_ms() {
  return get_new_prediction(_cost_per_card_ms_seq);
}

The article continues with a detailed description of the full G1 GC cycle, but the remaining part is omitted due to platform length limits.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java JVM Memory Management Garbage Collection performance tuning g1-gc

Written by

Meituan Technology Team

Over 10,000 engineers powering China’s leading lifestyle services e‑commerce platform. Supporting hundreds of millions of consumers, millions of merchants across 2,000+ industries. This is the public channel for the tech teams behind Meituan, Dianping, Meituan Waimai, Meituan Select, and related services.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.