Backend Development 10 min read

Why Is My Go Health‑Check So Slow? Diagnosing TCP Latency and GC Overhead

This article investigates why a Go‑based service health‑check system experiences high latency, examines differences from Nginx checks, runs experiments on physical machines and Docker, and explores goroutine scheduling, GOMAXPROCS, and garbage‑collection tuning to reduce average response time from 40 ms to under 10 ms.

Xiao Lou's Tech Notes

May 29, 2023

Why Is My Go Health‑Check So Slow? Diagnosing TCP Latency and GC Overhead

Background

The health‑check system periodically sends TCP connection requests to target servers and removes a target from the registry after a certain number of consecutive failures. The observed latency ranged from a few milliseconds to several hundred milliseconds, averaging over 40 ms, which is unusually high for an internal TCP handshake.

Necessity

Many services, including Nginx's built‑in active health checks, need to migrate to this system. Nginx’s default timeout (50‑100 ms) would be exceeded by the higher latency, causing false removals. Adjusting the timeout is not a viable solution because uneven load or node jitter could hide real failures.

Monitoring

All standard metrics (CPU, memory, disk, network) appeared normal; only the health‑check latency was abnormal.

Differences from Nginx

Nginx is written in C; our program is written in Go.

Nginx runs on bare metal; our program runs inside Docker containers.

Nginx checks a relatively small number of services, whereas our program may need to probe tens of thousands of targets per node.

Experiments

Two small experiments were conducted:

Deploy the health‑check on a physical machine and compare latency to the same target running inside Docker – the physical machine showed only a few milliseconds latency.

Deploy the service on another Docker host and observe similar low latency.

These results suggest that latency is related to the scale of concurrent checks, and the Go implementation has the potential to match Nginx’s performance.

Suspecting Goroutine Scheduling

Each target check spawns a goroutine, so the large number of goroutines might cause scheduling overhead. Using Go Trace (available in Go ≥ 1.5 with pprof enabled) we collected scheduling data:

curl -o trace.dump 'http://127.0.0.1:8600/debug/pprof/trace?seconds=30'

and then viewed it with: go tool trace trace.dump The trace showed that a single goroutine spent about 300 ms on scheduling over a 30‑second period.

GOMAXPROCS Issue

Inside containers, Go determines the number of processors from /proc/cpuinfo, which reflects the host’s CPU count, not the container’s limit. This mismatch can cause excessive runtime processors, leading to extra find‑runnable work and thread context switches.

Using the uber-go/automaxprocs library automatically sets GOMAXPROCS to the correct value for the container. After applying it, latency did not change noticeably.

Suspecting Garbage Collection

High check volume also increases memory allocation and GC pressure. Tracing the goroutine that establishes connections revealed that GC pauses dominated the timeline, sometimes reaching 100 % of the pause time.

Two main contributors were identified:

Debug logging.

Metric reporting.

Disabling debug logs reduced average latency from 40 ms to 30 ms.

GC Parameter Tuning

The only tunable GC parameter in Go is GOGC, which controls the heap growth threshold before a GC cycle. Experiments with values from 100 to 1000 showed that setting GOGC=500 gave the best result, lowering average latency from 10 ms to 8 ms.

Conclusion

Through systematic investigation—examining hardware vs. container deployment, goroutine scheduling, processor count, and GC behavior—the health‑check latency was reduced from an average of 40 ms to 8 ms, with the worst‑case dropping from 120 ms to 10 ms. Further improvements can be achieved by scaling resources, but the current optimizations already meet the required performance.