Operations 11 min read

Online Service Alarm Handling and Performance Profiling in Go

The article outlines a systematic SOP‑driven approach for diagnosing online service alarms and performance issues in Go, detailing a toolbox that includes pprof, trace, goroutine visualizers, perf and eBPF, and recommends application‑level optimizations, system tuning, and continuous profiling to accelerate root‑cause identification and reduce incident frequency.

DeWu Technology

Dec 14, 2021

Online Service Alarm Handling and Performance Profiling in Go

Background : When an online service triggers an alarm or exhibits mysterious performance issues, systematic diagnosis is essential. This article shares a practical methodology and toolchain for rapid root‑cause identification.

Alarm Investigation Process : Establish a standard SOP to break down incidents, communicate involvement early, and prioritize quick mitigation (restart, rollback) while gathering service ownership and resource metrics.

SOP Documentation : A set of SOPs covering service call exceptions, latency spikes, circuit‑breaker issues, MySQL/Redis latency, CPU/memory anomalies, traffic surges, and common business problems. Each SOP includes owners, tool links, and a “no‑search, no‑ask” principle.

Performance Diagnosis Toolbox :

pprof – Go’s primary CPU and memory profiler. Use runtime/pprof for embedded services or net/http/pprof for HTTP endpoints. Examine cumulative (cum) and flat costs to locate hot functions.

trace – Capture runtime events (goroutine scheduling, GC pauses) via curl host/debug/pprof/trace?seconds=10 > trace.out and analyze with go tool trace trace.out.

Goroutine visualization – Tools like divan/gotrace render execution graphs.

perf – System‑level profiling when pprof fails, showing symbol‑level hotspots.

eBPF – Dynamic, non‑intrusive tracing for kernel‑level insights; useful when Go‑level tools are insufficient.

Example Go code used with eBPF:

package main

import "fmt"

func main() {
    fmt.Println("Hello, BPF!")
}
# funclatency 'go:fmt.Println'
Tracing 1 functions for "go:fmt.Println"... Hit Ctrl-C to end.
^C
Function = fmt.Println [3041]
     nsecs               : count     distribution
          0 -> 1  : 0        |                           |
          2 -> 3  : 0        |                           |
          4 -> 7  : 0        |                           |
          8 -> 15 : 0        |                           |
        16 -> 31 : 0        |                           |
        32 -> 63 : 0        |                           |
        64 -> 127 : 0        |                           |
        128 -> 255 : 0        |                           |
        256 -> 511 : 0        |                           |
        512 -> 1023 : 0        |                           |
       1024 -> 2047 : 0        |                           |
       2048 -> 4095 : 0        |                           |
       4096 -> 8191 : 0        |                           |
       8192 -> 16383 : 27   |****************************************|
      16384 -> 32767 : 3    |****                        |
Detaching...

Optimization Strategies :

Application Layer : Pool resources (sync.Pool), tighten lock scopes, replace heavy JSON libraries, follow fasthttp best practices.

System Layer : Upgrade Go version, tune OS parameters (swap, NUMA), refer to Red Hat tuning guides.

Continuous Profiling : Periodic pprof collection (cron) with archiving enables time‑range analysis and diffing. Tools like Conprof provide a UI for this workflow.

eBPF + Go : Leverage eBPF for low‑overhead tracing of function latency and call stacks when traditional profilers are unavailable.

Conclusion : Effective incident handling combines SOPs, robust tooling, and disciplined benchmarking. Continuous profiling and proactive code reviews further reduce incident frequency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Go performance profiling pprof eBPF Online Service SOP

Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.