Boost Go Performance: Mastering GC with go trace, GOGC & GOMEMLIMIT
This article demonstrates how to analyze and optimize Go's garbage collection using go trace, comparing single‑threaded and concurrent implementations, and shows how tuning GOGC and GOMEMLIMIT can dramatically improve runtime and memory usage, with detailed code samples and performance metrics.
When writing programs in Go, developers often rely on the runtime's garbage collector (GC) and do not focus on memory usage, but performance‑critical scenarios benefit from understanding GC behavior and using go trace to optimize it.
The article is based on Arden Lions' talk "Evaluating Performance In Go" and serves as a blog version of that presentation.
All examples were run on a MacBook Pro M1 with ten cores.
The goal is to implement a service that processes multiple RSS XML files, extracts items whose titles contain the keyword go, and simulate load by parsing the same file 100 times. The full source code is available on GitHub.
single
List1: Use a single goroutine to count the keyword.
func freq(docs []string) int {
var count int
for _, doc := range docs {
f, err := os.OpenFile(doc, os.O_RDONLY, 0)
if err != nil {
return 0
}
data, err := io.ReadAll(f)
if err != nil {
return 0
}
var d document
if err := xml.Unmarshal(data, &d); err != nil {
log.Printf("Decoding Document [Ns] : ERROR :%+v", err)
return 0
}
for _, item := range d.Channel.Items {
if strings.Contains(strings.ToLower(item.Title), "go") {
count++
}
}
}
return count
}
func main() {
trace.Start(os.Stdout)
defer trace.Stop()
files := make([]string, 0)
for i := 0; i < 100; i++ {
files = append(files, "index.xml")
}
count := freq(files)
log.Println(fmt.Sprintf("find key word go %d count", count))
}Build and run:
go build
time ./go_trace 2 > trace_single.out
-- result --
2024/08/02 16:17:06 find key word go 2400 count
./go_trace 2 > trace_single.out 1.99s user 0.05s system 102% cpu 1.996 totalTrace analysis shows:
RunTime: 2031 ms
STW: 57 ms
GC Occurrences: 252 ms
GC STW Avg: 0.227 ms
GC time proportion: 57/2031 ≈ 0.02
Peak memory: ~11.28 MB
Figures illustrate runtime, GC time, and max heap.
Only one core is used, resulting in low resource utilization.
concurrent
List2: Use a worker‑pool (FinOut) approach to count the keyword.
func concurrent(docs []string) int {
var count int32
g := runtime.GOMAXPROCS(0)
wg := sync.WaitGroup{}
wg.Add(g)
ch := make(chan string, 100)
go func() {
for _, v := range docs {
ch <- v
}
close(ch)
}()
for i := 0; i < g; i++ {
go func() {
var iFound int32
defer func() {
atomic.AddInt32(&count, iFound)
wg.Done()
}()
for doc := range ch {
f, err := os.OpenFile(doc, os.O_RDONLY, 0)
if err != nil {
return
}
data, err := io.ReadAll(f)
if err != nil {
return
}
var d document
if err = xml.Unmarshal(data, &d); err != nil {
log.Printf("Decoding Document [Ns] : ERROR :%+v", err)
return
}
for _, item := range d.Channel.Items {
if strings.Contains(strings.ToLower(item.Title), "go") {
iFound++
}
}
}
}()
}
wg.Wait()
return int(count)
}Run the same workload:
go build
time ./go_trace 2 > trace_pool.out
2024/08/02 19:27:13 find key word go 2400 count
./go_trace 2 > trace_pool.out 2.83s user 0.13s system 673% cpu 0.439 totalTrace analysis shows:
RunTime: 425 ms
STW: 154 ms
GC Occurrences: 39
GC STW Avg: 3.9 ms
GC time proportion: 154/425 ≈ 0.36
Peak memory: 91.60 MB
Figures illustrate GC time and max heap.
The concurrent version is about five times faster, but GC now consumes roughly 36 % of total runtime.
GOGC & GOMEMLIMIT
Go 1.19 introduced two environment variables to control GC: GOGC – adjusts the frequency of garbage collection. GOMEMLIMIT – caps the maximum memory usage of the program.
Refer to the official gc‑guide for details.
GOGC
The GC target heap size follows the formula:
New heap memory = (Live heap + GC roots) * GOGC / 100Setting GOGC=1000 theoretically reduces GC frequency tenfold at the cost of roughly tenfold higher memory usage.
time GOGC=1000 ./go_trace 2 > trace_gogc_1000.out
2024/08/05 16:57:29 find key word go 2400 count
GOGC=1000 ./go_trace 2 > trace_gogc_1000.out 2.46s user 0.16s system 757% cpu 0.346 totalResults:
RunTime: 314 ms
STW: 9.572 ms
GC Occurrences: 5
GC STW Avg: 1.194 ms
GC time proportion: 9.572/314 ≈ 0.02
Peak memory: 451 MB
Figures show max heap and GC count.
GOMEMLIMIT
GOMEMLIMITsets an upper bound on memory usage; when the limit is reached, GC is forced. In the single‑threaded version the program uses ~11.28 MB, while the concurrent version with ten goroutines should reserve about 10 % extra, suggesting a limit around 124 MB.
time GOGC=off GOMEMLIMIT=124MiB ./go_trace 2 > trace_mem_limit.out
GOGC=off GOMEMLIMIT=124MiB ./go_trace 2 > trace_mem_limit.out 2.83s user 0.15s system 766% cpu 0.389 totalResults:
RunTime: 376.455 ms
STW: 41.578 ms
GC Occurrences: 14
GC STW Avg: 2.969 ms
GC time proportion: 41.578/376.455 ≈ 0.11
Peak memory: 120 MB (close to the limit)
Increasing the limit further improves performance:
time GOGC=off GOMEMLIMIT=248MiB ./go_trace 2 > trace_mem_248.outRunTime: 320.455 ms, STW: 11.429 ms, GC Occurrences: 5, GC STW Avg: 2.285 ms.
Setting GOMEMLIMIT=1024MiB yields a runtime of 406 ms, showing diminishing returns.
Risks
The Suggested_uses section of the gc‑guide advises that these parameters should only be used when the program’s environment and workload are well understood; otherwise they may degrade performance or cause crashes.
Summary
Applying appropriate values for GOGC and GOMEMLIMIT can significantly boost Go program performance and give developers finer control over GC behavior, but they must be tuned in controlled environments to avoid adverse effects in shared or unpredictable settings.
References
[1]Evaluating Performance In Go – https://www.youtube.com/watch?v=PYMs-urosXs&t=2684s [2] Go: Discovery of the Trace Package – https://medium.com/a-journey-with-go/go-discovery-of-the-trace-package-e5a821743c3c [3] gc‑guide – https://tip.golang.org/doc/gc-guide [4] Suggested_uses – https://tip.golang.org/doc/gc-guide#Suggested_uses
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
