Can Go’s GreenTeaGC Beat the Standard GC? Benchmark Results Revealed

A comprehensive benchmark compares Go's experimental GreenTeaGC (enabled via GOEXPERIMENT=greenteagc) against the standard GC using 30,000 and 50,000 long‑lived TCP connections, measuring GC pauses, heap usage, CPU load and scalability, and finds no decisive performance advantage.

Tech Musings
Tech Musings
Tech Musings
Can Go’s GreenTeaGC Beat the Standard GC? Benchmark Results Revealed

Test Background

Go 1.26 introduces the experimental GreenTeaGC algorithm, which can be enabled with GOEXPERIMENT=greenteagc. This report benchmarks GreenTeaGC against the standard GC in a realistic workload.

Test Environment

Hardware

root@VM-16-14-ubuntu:/data/webp# neofetch
                .-/+oossssoo+/.               root@VM-16-14-ubuntu
               `:+ssssssssssssssssss+:`               --------------------
              -+ssssssssssssssssssyyssss+-               OS: Ubuntu 24.04.3 LTS x86_64
            .ossssssssssssssssssdMMMNysssso.               Host: CVM 3.0
           /ssssssssssshdmmNNmmyNMMMMhssssss/              Kernel: 6.8.0-88-generic
          +ssssssssshmydMMMMMMMNddddyssssssss+            Uptime: 1 day, 35 mins
         /sssssssshNMMMyhhyyyyhmNMMMNhssssssss/         Packages: 913 (dpkg), 4 (snap)
        .ssssssssdMMMNhsssssssssshNMMMdssssssss.         Shell: bash 5.2.21
       +sssshhhyNMMNyssssssssssssyNMMMysssssss+         Resolution: 1024x768
       ossyNMMMNyMMhsssssssssssssshmmmhssssssso         Terminal: /dev/pts/0
       ossyNMMMNyMMhsssssssssssssshmmmhssssssso         CPU: AMD EPYC 7K62 (2) @ 2.595GHz
       +sssshhhyNMMNyssssssssssssyNMMMysssssss+         GPU: 00:02.0 Cirrus Logic GD 5446
       .ssssssssdMMMNhsssssssssshNMMMdssssssss.         Memory: 1886MiB / 3659MiB
       /sssssssshNMMMyhhyyyyhdNMMMNhssssssss/

Software

Go version: 1.25.4

GC configuration: GreenTeaGC ( GOEXPERIMENT=greenteagc)

Test Application

A custom TCP long‑connection load generator simulates massive IoT device clients. Architecture: TCP client → IoT server. Connections are kept alive, with a heartbeat every 5 minutes and location reports every 10 minutes. All interactions are synchronous (send → wait → reply).

Test Scenarios

Scenario 1: 30 000 concurrent TCP connections

Scenario 2: 50 000 concurrent TCP connections

Each scenario runs for 1 hour; metrics are sampled every 5 seconds.

Collected Metrics

For each run the following indicators were recorded: GC total count, GC frequency, average and maximum pause time, heap memory (average and maximum), total CPU usage, and Goroutine count.

Scenario 1 (30 k connections) Results

GC total count: 40 (standard) vs 41 (GreenTeaGC)  (+2.5%)
Average GC frequency: 0.008 times/s (both)
Average pause: 0.007 ms (both)
Maximum pause: 1.970 ms (standard) vs 1.570 ms (GreenTeaGC)  (-20.3%)
Average heap memory: 384.6 MB (both)
Maximum heap memory: 452 MB (standard) vs 449 MB (GreenTeaGC)  (-0.7%)
Average total CPU: 7.03 % (standard) vs 7.18 % (GreenTeaGC)  (+2.1%)
Maximum total CPU: 16.50 % (standard) vs 15.90 % (GreenTeaGC)  (-3.6%)
Final Goroutine count: 89 995 (standard) vs 89 896 (GreenTeaGC)  (-0.1%)

Scenario 2 (50 k connections) Results

GC total count: 41 (standard) vs 40 (GreenTeaGC)  (-2.4%)
Average GC frequency: 0.008 times/s (both)
Average pause: 0.009 ms (standard) vs 0.008 ms (GreenTeaGC)  (-11.1%)
Maximum pause: 2.060 ms (standard) vs 3.060 ms (GreenTeaGC)  (+48.5%)
Average heap memory: 628.3 MB (standard) vs 629.3 MB (GreenTeaGC)  (+0.2%)
Maximum heap memory: 730 MB (standard) vs 728 MB (GreenTeaGC)  (-0.3%)
Average total CPU: 16.35 % (standard) vs 16.80 % (GreenTeaGC)  (+2.8%)
Maximum total CPU: 157.80 % (standard) vs 169.80 % (GreenTeaGC)  (+7.6%)
Final Goroutine count: 149 821 (standard) vs 149 803 (GreenTeaGC)  (0%)

Data Analysis

GC Performance

GC frequency is virtually identical for both collectors (~0.008 times/s).

In the 30 k scenario GreenTeaGC reduces maximum pause by 20 % while average pause stays the same.

In the 50 k scenario average pause drops 11 % but maximum pause rises 49 %.

Conclusion: No clear advantage in pause time.

CPU Consumption

GreenTeaGC shows a 2‑3 % higher CPU usage, which is within the statistical error margin.

Conclusion: CPU consumption is essentially equal.

Memory Usage

Heap memory consumption is almost identical between the two collectors.

Conclusion: No measurable difference in memory usage.

Scalability

When scaling from 30 k to 50 k connections, resource growth (connections, GC frequency, total CPU, heap memory) is similar for both collectors.

Conclusion: Scalability performance is consistent.

Key Findings

Overall performance (GC frequency, memory, CPU, scalability) is comparable.

GreenTeaGC reduces maximum pause by 20 % in the 30 k case but increases it by 49 % in the 50 k case.

Average CPU is 2‑3 % higher with GreenTeaGC, within measurement noise.

Limitations

Only a single low‑frequency TCP‑long‑connection workload was evaluated.

Heartbeat (5 min) and location reports (10 min) generate very few temporary objects, providing little allocation pressure.

Synchronous request handling limits concurrency pressure.

The 2 CPU / 4 GB VM may cap the achievable scale.

Metrics sampled every 5 seconds could miss short spikes.

Conclusions & Recommendations

In a 2 CPU, 4 GB environment with 30‑50 k long‑lived TCP connections, GreenTeaGC delivers performance essentially indistinguishable from the standard GC. It may become beneficial in scenarios with higher allocation rates (e.g., HTTP short‑lived requests), tighter memory constraints, larger connection counts (100 k+), or real‑time systems that are sensitive to GC latency.

GC Monitoring Implementation (Go)

Core Sampling Logic

func (m *GCMonitor) sample() {
    m.sampleCount++
    now := time.Now()
    sinceStart := now.Sub(m.startTime)

    // Read current GC stats
    var currentGCStats debug.GCStats
    debug.ReadGCStats(¤tGCStats)

    // Read memory stats
    var memStats runtime.MemStats
    runtime.ReadMemStats(&memStats)

    // Compute deltas
    gcDelta := calculateGCDelta(&m.lastGCStats, ¤tGCStats)
    timeDelta := now.Sub(m.lastSample).Seconds()

    // Compute CPU metrics
    cpuMetrics := m.calculateCPUMetrics(timeDelta)

    // Build snapshot
    snapshot := MetricSnapshot{
        Timestamp:    now,
        SinceStart:   formatDuration(sinceStart),
        SampleNumber: m.sampleCount,
        GCStats:      m.extractGCMetrics(¤tGCStats, &gcDelta, timeDelta),
        MemStats:     m.extractMemoryMetrics(&memStats),
        NumGoroutine: runtime.NumGoroutine(),
        NumCPU:       runtime.NumCPU(),
        CPUMetrics:   cpuMetrics,
    }

    // Write logs and optional JSON
    m.writeToLog(&snapshot)
    if m.config.EnableJSON {
        m.writeToJSON(&snapshot)
    }

    // Update previous state
    m.lastGCStats = currentGCStats
    m.lastSample = now
}

GC Metric Extraction

func (m *GCMonitor) extractGCMetrics(stats *debug.GCStats, delta *GCDelta, timeDelta float64) GCMetrics {
    metrics := GCMetrics{
        NumGC:      uint32(stats.NumGC),
        NumGCDelta: delta.NumGC,
        PauseTotal: float64(stats.PauseTotal) / 1e6, // ms
        PauseDelta: delta.PauseTotal / 1e6,
    }
    if stats.NumGC > 0 {
        metrics.PauseAvg = metrics.PauseTotal / float64(stats.NumGC)
    }
    if delta.NumGC > 0 {
        metrics.PauseDeltaAvg = metrics.PauseDelta / float64(delta.NumGC)
    }
    if len(stats.Pause) > 0 {
        metrics.LastPause = float64(stats.Pause[0]) / 1e6
    }
    if timeDelta > 0 {
        metrics.GCRate = float64(delta.NumGC) / timeDelta
    }
    return metrics
}

GC Delta Calculation

func calculateGCDelta(last, current *debug.GCStats) GCDelta {
    delta := GCDelta{NumGC: 0, PauseTotal: 0}
    if current.NumGC > last.NumGC {
        delta.NumGC = uint32(current.NumGC - last.NumGC)
        delta.PauseTotal = float64(current.PauseTotal - last.PauseTotal)
    }
    return delta
}

Memory Metric Extraction

func (m *GCMonitor) extractMemoryMetrics(stats *runtime.MemStats) MemoryMetrics {
    return MemoryMetrics{
        HeapAlloc:    stats.HeapAlloc / 1024 / 1024,
        HeapSys:      stats.HeapSys / 1024 / 1024,
        HeapInuse:    stats.HeapInuse / 1024 / 1024,
        HeapIdle:     stats.HeapIdle / 1024 / 1024,
        HeapReleased: stats.HeapReleased / 1024 / 1024,
        TotalAlloc:   stats.TotalAlloc / 1024 / 1024,
        Sys:          stats.Sys / 1024 / 1024,
        NextGC:       stats.NextGC / 1024 / 1024,
        NumObjects:   stats.Mallocs - stats.Frees,
        StackInuse:   stats.StackInuse / 1024 / 1024,
        StackSys:     stats.StackSys / 1024 / 1024,
    }
}

CPU Monitoring (Unix/Linux)

func (m *CPUMonitor) getCPUTime() (userTime, sysTime float64) {
    var rusage syscall.Rusage
    if err := syscall.Getrusage(syscall.RUSAGE_SELF, &rusage); err != nil {
        return 0, 0
    }
    userSec := float64(rusage.Utime.Sec) + float64(rusage.Utime.Usec)/1e6
    sysSec := float64(rusage.Stime.Sec) + float64(rusage.Stime.Usec)/1e6
    return userSec, sysSec
}

func (m *CPUMonitor) GetUsage() (totalPercent, userPercent, sysPercent float64) {
    currentUser, currentSys := m.getCPUTime()
    currentTotal := currentUser + currentSys
    now := time.Now()
    if !m.lastUpdate.IsZero() {
        elapsed := now.Sub(m.lastUpdate).Seconds()
        if elapsed > 0 {
            userDelta := currentUser - m.lastUserTime
            sysDelta := currentSys - m.lastSysTime
            totalDelta := userDelta + sysDelta
            totalPercent = (totalDelta / elapsed) * 100
            userPercent = (userDelta / elapsed) * 100
            sysPercent = (sysDelta / elapsed) * 100
        }
    }
    m.lastUserTime = currentUser
    m.lastSysTime = currentSys
    m.lastTotalTime = currentTotal
    m.lastUpdate = now
    return
}

Log Output Format

func (m *GCMonitor) writeToLog(snapshot *MetricSnapshot) {
    line := fmt.Sprintf(
        "[%s] [%10s] #%-4d | GC: %5d (+%-2d) Pause: %8.2fms (+%6.2fms / avg %5.2fms) Rate: %.2f/s | Mem: Heap %5dMB | CPU: %.1f%% | Goroutine: %5d
",
        snapshot.Timestamp.Format("15:04:05"),
        snapshot.SinceStart,
        snapshot.SampleNumber,
        snapshot.GCStats.NumGC,
        snapshot.GCStats.NumGCDelta,
        snapshot.GCStats.PauseTotal,
        snapshot.GCStats.PauseDelta,
        snapshot.GCStats.PauseDeltaAvg,
        snapshot.GCStats.GCRate,
        snapshot.MemStats.HeapAlloc,
        snapshot.CPUMetrics.TotalUsage,
        snapshot.NumGoroutine,
    )
    m.logFile.WriteString(line)
    m.logFile.Sync()
}

Key Data Structures

type MetricSnapshot struct {
    Timestamp    time.Time
    SinceStart   string
    SampleNumber int
    GCStats      GCMetrics
    MemStats     MemoryMetrics
    NumGoroutine int
    NumCPU       int
    CPUMetrics   CPUMetrics
}

type GCMetrics struct {
    NumGC          uint32
    NumGCDelta     uint32
    PauseTotal     float64
    PauseDelta     float64
    PauseAvg       float64
    PauseDeltaAvg  float64
    LastPause      float64
    GCRate         float64
}

type MemoryMetrics struct {
    HeapAlloc    uint64
    HeapSys      uint64
    HeapInuse    uint64
    HeapIdle     uint64
    HeapReleased uint64
    TotalAlloc   uint64
    Sys          uint64
    NextGC       uint64
    NumObjects   uint64
    StackInuse   uint64
    StackSys     uint64
}

type CPUMetrics struct {
    TotalUsage float64
}
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentGogarbage collectionperformance benchmarkGreenTeaGC
Tech Musings
Written by

Tech Musings

Capturing thoughts and reflections while coding.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.