Performance Optimization Techniques for Go Standard Library
The article surveys a range of Go standard‑library performance tricks—from using sync.Pool and zero‑copy string/byte conversions to reducing lock contention, leveraging go:linkname, caching call‑frame data, optimizing cgo calls, employing custom epoll, SIMD, and occasional JIT—while urging profiling‑first, readability‑preserving optimizations.
This article summarizes a collection of performance‑optimization tricks that were observed while maintaining Go's standard library, covering both conventional and unconventional methods.
1. sync.Pool – Using a temporary object pool has minimal impact on readability while providing significant speed gains. Many high‑performance libraries such as fasthttp rely heavily on sync.Pool, but misuse (e.g., passing a pooled RequestCtx to another goroutine) can cause bugs.
2. string↔bytes conversion – Reusing objects by converting strings to byte slices (and vice‑versa) avoids allocations. The Go standard library provides gostringnocopy for zero‑copy conversion, but the resulting byte slice must not be mutated.
3. Goroutine pool – Generally unnecessary in Go, but can limit goroutine count, reduce stack growth, and reuse resources in high‑frequency creation scenarios. Overuse adds complexity without measurable benefit for most workloads.
4. Reflection – Reflection is slow and hard to read; with upcoming generics it is often better to avoid it. Common optimizations include caching reflection results (e.g., json‑iterator ), using unsafe.Pointer for field offsets, or employing go‑reflect to eliminate generic reflection overhead.
5. Reducing lock contention – Use finer‑grained locks or lock‑free primitives. The standard library’s math/rand suffers from a global lock; replacing it with runtime.fastrand yields a ~6× speedup (see benchmark below).
Benchmark_MathRand-12 84419976 13.98 ns/op
Benchmark_Runtime-12 505765551 2.158 ns/op6. go:linkname – Allows linking to unexported runtime symbols. Example:
//go:linkname FastRand runtime.fastrand
func FastRand() uint32Benchmark shows runtime.fastrand is ~6× faster than math/rand . Similar tricks can replace time.Now with runtime.walltime1 for faster timestamps.
Benchmark_Time-12 16323418 73.30 ns/op
Benchmark_Runtime-12 29912856 38.10 ns/op7. Log function name/line retrieval – Caching the result of runtime.CallersFrames removes the ~60% cost of the second step (pc → funcInfo) in stack trace generation.
var m sync.Map
func Caller(skip int) (pc uintptr, file string, line int, ok bool) { … }Benchmark after caching:
BenchmarkCaller-8 2765967 431.7 ns/op
BenchmarkRuntime-8 1000000 1085 ns/op8. cgo – Calls to C/C++ run on the g0 stack and block the Go scheduler. Directly invoking runtime.asmcgocall can avoid the extra scheduler hop.
//go:linkname asmcgocall runtime.asmcgocall
func asmcgocall(fn unsafe.Pointer, arg uintptr) int32Benchmark:
BenchmarkCgo-12 16143393 73.01 ns/op 16 B/op 1 allocs/op
BenchmarkAsmCgoCall-12 119081407 9.505 ns/op 0 B/op 0 allocs/op9. epoll – Go’s runtime uses a single epoll for network I/O. Third‑party libraries (e.g., gnet , ByteDance’s netpoll) implement their own epoll to improve scalability, but the added complexity often outweighs the marginal gains for typical services.
10. Package size reduction – Debug sections inflated binary size when using cgo on older linkers. Upgrading ld to support --compress-debug-sections=zlib-gnu reduced binary size by ~50%.
11. SIMD – Go’s linker can handle SIMD, but the compiler cannot generate SIMD instructions directly. Work‑arounds include hand‑written assembly, LLVM‑generated assembly, or calling SIMD code via cgo. Popular libraries using SIMD: simdjson-go , sonic , sha256‑simd . Drawbacks are maintainability, cross‑platform support, and debugging difficulty.
12. JIT – Go can embed JIT via assembly or external tools; practical use cases are rare, with ByteDance’s Sonic being a notable example.
Conclusion – Early optimization is harmful. Start with profiling (pprof, race, escape analysis), apply well‑known techniques first, and only consider exotic tricks when they provide measurable benefits without sacrificing readability, compatibility, or stability.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.