Backend Development 13 min read

Boost Go Performance: Mastering GC with go trace, GOGC & GOMEMLIMIT

This article demonstrates how to analyze and optimize Go's garbage collection using go trace, comparing single‑threaded and concurrent implementations, and shows how tuning GOGC and GOMEMLIMIT can dramatically improve runtime and memory usage, with detailed code samples and performance metrics.

Radish, Keep Going!

Aug 7, 2024

Boost Go Performance: Mastering GC with go trace, GOGC & GOMEMLIMIT

When writing programs in Go, developers often rely on the runtime's garbage collector (GC) and do not focus on memory usage, but performance‑critical scenarios benefit from understanding GC behavior and using go trace to optimize it.

The article is based on Arden Lions' talk "Evaluating Performance In Go" and serves as a blog version of that presentation.

All examples were run on a MacBook Pro M1 with ten cores.

The goal is to implement a service that processes multiple RSS XML files, extracts items whose titles contain the keyword go, and simulate load by parsing the same file 100 times. The full source code is available on GitHub.

single

List1: Use a single goroutine to count the keyword.

func freq(docs []string) int {
    var count int
    for _, doc := range docs {
        f, err := os.OpenFile(doc, os.O_RDONLY, 0)
        if err != nil {
            return 0
        }
        data, err := io.ReadAll(f)
        if err != nil {
            return 0
        }
        var d document
        if err := xml.Unmarshal(data, &d); err != nil {
            log.Printf("Decoding Document [Ns] : ERROR :%+v", err)
            return 0
        }
        for _, item := range d.Channel.Items {
            if strings.Contains(strings.ToLower(item.Title), "go") {
                count++
            }
        }
    }
    return count
}

func main() {
    trace.Start(os.Stdout)
    defer trace.Stop()
    files := make([]string, 0)
    for i := 0; i < 100; i++ {
        files = append(files, "index.xml")
    }
    count := freq(files)
    log.Println(fmt.Sprintf("find key word go %d count", count))
}

Build and run:

go build
time ./go_trace 2 > trace_single.out
-- result --
2024/08/02 16:17:06 find key word go 2400 count
./go_trace 2 > trace_single.out  1.99s user 0.05s system 102% cpu 1.996 total

Trace analysis shows:

RunTime: 2031 ms

STW: 57 ms

GC Occurrences: 252 ms

GC STW Avg: 0.227 ms

GC time proportion: 57/2031 ≈ 0.02

Peak memory: ~11.28 MB

Figures illustrate runtime, GC time, and max heap.

Only one core is used, resulting in low resource utilization.

concurrent

List2: Use a worker‑pool (FinOut) approach to count the keyword.

func concurrent(docs []string) int {
    var count int32
    g := runtime.GOMAXPROCS(0)
    wg := sync.WaitGroup{}
    wg.Add(g)
    ch := make(chan string, 100)
    go func() {
        for _, v := range docs {
            ch <- v
        }
        close(ch)
    }()
    for i := 0; i < g; i++ {
        go func() {
            var iFound int32
            defer func() {
                atomic.AddInt32(&count, iFound)
                wg.Done()
            }()
            for doc := range ch {
                f, err := os.OpenFile(doc, os.O_RDONLY, 0)
                if err != nil {
                    return
                }
                data, err := io.ReadAll(f)
                if err != nil {
                    return
                }
                var d document
                if err = xml.Unmarshal(data, &d); err != nil {
                    log.Printf("Decoding Document [Ns] : ERROR :%+v", err)
                    return
                }
                for _, item := range d.Channel.Items {
                    if strings.Contains(strings.ToLower(item.Title), "go") {
                        iFound++
                    }
                }
            }
        }()
    }
    wg.Wait()
    return int(count)
}

Run the same workload:

go build
time ./go_trace 2 > trace_pool.out
2024/08/02 19:27:13 find key word go 2400 count
./go_trace 2 > trace_pool.out  2.83s user 0.13s system 673% cpu 0.439 total

Trace analysis shows:

RunTime: 425 ms

STW: 154 ms

GC Occurrences: 39

GC STW Avg: 3.9 ms

GC time proportion: 154/425 ≈ 0.36

Peak memory: 91.60 MB

Figures illustrate GC time and max heap.

The concurrent version is about five times faster, but GC now consumes roughly 36 % of total runtime.

GOGC & GOMEMLIMIT

Go 1.19 introduced two environment variables to control GC: GOGC – adjusts the frequency of garbage collection. GOMEMLIMIT – caps the maximum memory usage of the program.

Refer to the official gc‑guide for details.

GOGC

The GC target heap size follows the formula:

New heap memory = (Live heap + GC roots) * GOGC / 100

Setting GOGC=1000 theoretically reduces GC frequency tenfold at the cost of roughly tenfold higher memory usage.

time GOGC=1000 ./go_trace 2 > trace_gogc_1000.out
2024/08/05 16:57:29 find key word go 2400 count
GOGC=1000 ./go_trace 2 > trace_gogc_1000.out  2.46s user 0.16s system 757% cpu 0.346 total

Results:

RunTime: 314 ms

STW: 9.572 ms

GC Occurrences: 5

GC STW Avg: 1.194 ms

GC time proportion: 9.572/314 ≈ 0.02

Peak memory: 451 MB

Figures show max heap and GC count.

GOMEMLIMIT

GOMEMLIMIT

sets an upper bound on memory usage; when the limit is reached, GC is forced. In the single‑threaded version the program uses ~11.28 MB, while the concurrent version with ten goroutines should reserve about 10 % extra, suggesting a limit around 124 MB.

time GOGC=off GOMEMLIMIT=124MiB ./go_trace 2 > trace_mem_limit.out
GOGC=off GOMEMLIMIT=124MiB ./go_trace 2 > trace_mem_limit.out  2.83s user 0.15s system 766% cpu 0.389 total

Results:

RunTime: 376.455 ms

STW: 41.578 ms

GC Occurrences: 14

GC STW Avg: 2.969 ms

GC time proportion: 41.578/376.455 ≈ 0.11

Peak memory: 120 MB (close to the limit)

Increasing the limit further improves performance:

time GOGC=off GOMEMLIMIT=248MiB ./go_trace 2 > trace_mem_248.out

RunTime: 320.455 ms, STW: 11.429 ms, GC Occurrences: 5, GC STW Avg: 2.285 ms.

Setting GOMEMLIMIT=1024MiB yields a runtime of 406 ms, showing diminishing returns.

Risks

The Suggested_uses section of the gc‑guide advises that these parameters should only be used when the program’s environment and workload are well understood; otherwise they may degrade performance or cause crashes.

Summary

Applying appropriate values for GOGC and GOMEMLIMIT can significantly boost Go program performance and give developers finer control over GC behavior, but they must be tuned in controlled environments to avoid adverse effects in shared or unpredictable settings.

References

[1]

Evaluating Performance In Go – https://www.youtube.com/watch?v=PYMs-urosXs&t=2684s [2] Go: Discovery of the Trace Package – https://medium.com/a-journey-with-go/go-discovery-of-the-trace-package-e5a821743c3c [3] gc‑guide – https://tip.golang.org/doc/gc-guide [4] Suggested_uses – https://tip.golang.org/doc/gc-guide#Suggested_uses