Detecting and Preventing Goroutine Leaks in PouchContainer

This article explains what goroutine leaks are, demonstrates how they occur in Alibaba's PouchContainer runtime, and provides practical detection methods and code‑level fixes using net/http/pprof, runtime.NumGoroutine, and CloseNotifier to keep Go services healthy.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
Detecting and Preventing Goroutine Leaks in PouchContainer

0. Introduction

PouchContainer is an open‑source container runtime written in Go. It heavily uses goroutines for container, image, and log management. A goroutine leak occurs when a goroutine blocks on a channel that never receives a value, causing the goroutine to remain alive and consume resources.

1. Goroutine Leak

In Go a goroutine is started with the go keyword. If the goroutine finishes its work it returns to the pool; if it stays blocked (e.g., waiting on a channel with no sender) it leaks.

func main() {
    waitCh := make(chan struct{})
    go func() {
        fmt.Println("Hi, Pouch. I'm new gopher!")
        waitCh <- struct{}{}
    }()
    <-waitCh
}

A classic leak appears in an HTTP handler that runs a shell command. When the client aborts the request before the command finishes, the handler’s goroutine continues to run.

func main() {
    http.HandleFunc("/exec", func(w http.ResponseWriter, r *http.Request) {
        defer func() { log.Printf("finish %v
", r.URL) }()
        out, err := genCmd(r).CombinedOutput()
        if err != nil {
            w.WriteHeader(500)
            w.Write([]byte(err.Error()))
            return
        }
        w.Write(out)
    })
    log.Fatal(http.ListenAndServe(":8080", nil))
}

func genCmd(r *http.Request) (cmd *exec.Cmd) {
    var args []string
    if got := r.FormValue("args"); got != "" {
        args = strings.Split(got, " ")
    }
    if c := r.FormValue("cmd"); len(args) == 0 {
        cmd = exec.Command(c)
    } else {
        cmd = exec.Command(c, args...)
    }
    return
}

Running the request with a short timeout, e.g. curl -m 1 "{ip}:8080/exec?cmd=sleep&args=10000", disconnects the client while the command is still sleeping. The goroutine never receives a broken‑pipe error and therefore leaks.

2. Pouch Logs API Practice

2.1 Specific Scenario

The Logs API spawns a goroutine that reads container logs and streams them to the client via a channel. When the client stops following the log, the handler may still be waiting on the channel.

func logsContainer(ctx context.Context, w http.ResponseWriter, r *http.Request) {
    // ...
    writeLogStream(ctx, w, msgCh)
    return
}

func writeLogStream(ctx context.Context, w http.ResponseWriter, msgCh <-chan Message) {
    for {
        select {
        case <-ctx.Done():
            return
        case msg, ok := <-msgCh:
            if !ok {
                return
            }
            w.Write(msg.Byte())
        }
    }
}
If the connection is broken but the handler still tries to write, a "write: broken pipe" error occurs. If the handler continues waiting, the goroutine leaks.

2.2 Detecting Goroutine Leak

Import net/http/pprof and query {ip}:{port}/debug/pprof/goroutine?debug=2 to view stack traces. After stopping a log‑follow request, the logsContainer goroutine should disappear; if it remains, a leak is present.

# step 1: create a background job
pouch run -d busybox sh -c "while true; do sleep 1; done"

# step 2: follow the log and stop after 3 seconds
curl -m 3 {ip}:{port}/v1.24/containers/{container_id}/logs?stdout=1&follow=1

# step 3: dump the stack info and look for logsContainer
curl -s "{ip}:{port}/debug/pprof/goroutine?debug=2" | grep -A 10 logsContainer

If the output still contains a stack frame for logsContainer, the goroutine has leaked.

2.3 Fixing the Leak

Use the http.CloseNotifier interface to detect client disconnects and cancel the request’s context, allowing the goroutine to exit.

// HTTP handler interceptor that adds cancel support
func withCancelHandler(h handler) handler {
    return func(ctx context.Context, rw http.ResponseWriter, req *http.Request) error {
        if notifier, ok := rw.(http.CloseNotifier); ok {
            var cancel context.CancelFunc
            ctx, cancel = context.WithCancel(ctx)
            defer cancel()
            waitCh := make(chan struct{})
            defer close(waitCh)
            closeNotify := notifier.CloseNotify()
            go func() {
                select {
                case <-closeNotify:
                    cancel()
                case <-waitCh:
                }
            }()
        }
        return h(ctx, rw, req)
    }
}
CloseNotifier does not work for hijacked connections because the HTTP server no longer manages the connection.

For functions that accept a context.Context, use context.WithTimeout or context.WithCancel to bound the goroutine’s lifetime.

3. Common Analysis Tools

3.1 net/http/pprof

Enable net/http/pprof in the server and visit /debug/pprof/goroutine to obtain stack traces. Example snippet from a PouchContainer process:

goroutine 93 [chan receive]:
github.com/alibaba/pouch/daemon/mgr.NewContainerMonitor.func1(0xc4202ce618)
    /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container_monitor.go:62 +0x45
created by github.com/alibaba/pouch/daemon/mgr.NewContainerMonitor
    /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container_monitor.go:60 +0x8d

goroutine 94 [chan receive]:
github.com/alibaba/pouch/daemon/mgr.(*ContainerManager).execProcessGC(0xc42037e090)
    /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container.go:2177 +0x1a5
created by github.com/alibaba/pouch/daemon/mgr.NewContainerManager
    /tmp/pouchbuild/src/github.com/alibaba/pouch/daemon/mgr/container.go:179 +0x50b

Search the stack output for function names (e.g., (*Server).logsContainer) to detect leaks.

3.2 runtime.NumGoroutine

When the test and the code under test run in the same process, compare the number of goroutines before and after the operation.

func TestXXX(t *testing.T) {
    orgNum := runtime.NumGoroutine()
    defer func() {
        if got := runtime.NumGoroutine(); orgNum != got {
            t.Fatalf("goroutine leak: before %d, after %d", orgNum, got)
        }
    }()
    // ... test logic ...
}

3.3 github.com/google/gops

The gops tool embeds an agent in the process and provides the CLI command gops stack ${PID} to dump the current goroutine stacks, similar to pprof.

4. Summary

When developing HTTP servers in Go, net/http/pprof is essential for inspecting goroutine states. For code paths that may block indefinitely, annotate suspect functions and add automated checks—either stack‑trace keyword matching or runtime.NumGoroutine comparisons—in CI tests to catch leaks before code review.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

pprofleak detectionGoroutinecontainer-runtime
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.