Fundamentals 7 min read

Can Green Tea GC Revolutionize Go’s Garbage Collection Performance?

This article examines Go’s concurrent mark‑sweep garbage collector, its latency advantages and scalability limits, then evaluates the new Green Tea GC proposal, detailing its span‑based scanning, benchmark results, and where it offers measurable improvements over the existing GC.

Radish, Keep Going!

May 4, 2025

Can Green Tea GC Revolutionize Go’s Garbage Collection Performance?

Go GC Design and Implementation

Since Go 1.5, the language uses a concurrent mark‑sweep algorithm combined with a three‑color marking model and Yuasa write barriers.

In short, Go GC runs in the background, concurrently traversing the heap, marking reachable objects and gradually reclaiming unreachable memory, aiming for low latency and short stop‑the‑world pauses.

Concurrent marking and concurrent sweeping

No object relocation (no compaction)

Span‑based incremental sweeping to reduce each STW pause

This design lets most of the application run in parallel with the collector, keeping typical pause times below a millisecond.

Known Issues of Go GC

Despite good latency, Go GC still suffers from significant CPU time and scalability drawbacks:

Inefficient memory access : During the marking phase the collector jumps across objects, causing frequent cache misses; about 35% of GC CPU cycles are spent waiting for memory, especially on NUMA or large‑core machines.

Lack of generational collection : All objects are treated equally, which becomes costly under high allocation rates; engineers have observed CPU spikes when memory pressure rises.

High CPU usage from frequent collections : Even with modest heap sizes (<450 MiB), systems may trigger 8–10 GCs per second, consuming roughly 30% of CPU time and crowding out business threads.

Performance Tests: GC Impact on Go Programs

Benchmark observations:

Go 1.3/1.4 (pre‑concurrent GC) : GC pauses on large heaps (10 GB+) measured in seconds.

Go 1.5 (concurrent GC introduced) : Same conditions reduced pause time to <1 ms.

Go 1.6–1.8 : With heaps up to 200 GB, pauses stayed below 20 ms, often around 1 ms.

While latency is well‑controlled, total pause time and CPU consumption remain noticeable under heavy load.

Green Tea GC: New Optimization Approach

To address these problems, the Go team proposed Green Tea GC, whose core improvements are:

Small objects (≤512 B) are marked at the span level instead of per‑object.

Only the first marked object in a span pushes the entire span onto the scan queue.

The scanning phase processes whole spans in batches, greatly improving memory‑access locality.

Enhanced parallel queue management using a work‑stealing scheduler similar to Go’s runtime, boosting multi‑core scalability.

Green Tea GC Real‑World Performance

Initial benchmarks show selective gains:

Tile38 benchmark (high‑fan‑out tree) : GC overhead reduced by ~35%, with overall throughput, latency, and memory usage improving.

Bleve‑index benchmark (low fan‑out, frequent mutations) : Objects are scattered and locality is poor; Green Tea GC performs similarly to the standard collector, sometimes slightly worse.

In summary, Green Tea GC is not a universal silver bullet, but it delivers clear advantages for workloads with good memory locality and high allocation intensity, and it lays groundwork for future SIMD‑accelerated optimizations.

Comparison Overview

Marking granularity : Current Go GC – per‑object; Green Tea GC – span‑based batch.

Memory locality : Current – poor, random jumps; Green Tea – high, batch within spans.

Multi‑core scalability : Current – limited; Green Tea – improved via work‑stealing queues.

Performance gain : Current – near the low‑latency ceiling; Green Tea – up to 35% reduction in GC time for certain scenarios.

Applicable workloads : Current – general use; Green Tea – memory‑local, allocation‑intensive workloads.

For developers chasing extreme performance, Green Tea GC offers a promising direction, though it remains experimental and is expected to land in Go 1.25 or Go 1.26; it can be tried today with gotip.

References: GitHub Issue #73581; related StackOverflow discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Go Garbage Collection Concurrent Mark‑Sweep Green Tea GC

Written by

Radish, Keep Going!

Personal sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.