Operations 11 min read

Online Service Alarm Handling and Performance Profiling in Go

The article outlines a systematic SOP‑driven approach for diagnosing online service alarms and performance issues in Go, detailing a toolbox that includes pprof, trace, goroutine visualizers, perf and eBPF, and recommends application‑level optimizations, system tuning, and continuous profiling to accelerate root‑cause identification and reduce incident frequency.

DeWu Technology
DeWu Technology
DeWu Technology
Online Service Alarm Handling and Performance Profiling in Go

Background : When an online service triggers an alarm or exhibits mysterious performance issues, systematic diagnosis is essential. This article shares a practical methodology and toolchain for rapid root‑cause identification.

Alarm Investigation Process : Establish a standard SOP to break down incidents, communicate involvement early, and prioritize quick mitigation (restart, rollback) while gathering service ownership and resource metrics.

SOP Documentation : A set of SOPs covering service call exceptions, latency spikes, circuit‑breaker issues, MySQL/Redis latency, CPU/memory anomalies, traffic surges, and common business problems. Each SOP includes owners, tool links, and a “no‑search, no‑ask” principle.

Performance Diagnosis Toolbox :

pprof – Go’s primary CPU and memory profiler. Use runtime/pprof for embedded services or net/http/pprof for HTTP endpoints. Examine cumulative (cum) and flat costs to locate hot functions.

trace – Capture runtime events (goroutine scheduling, GC pauses) via curl host/debug/pprof/trace?seconds=10 > trace.out and analyze with go tool trace trace.out .

Goroutine visualization – Tools like divan/gotrace render execution graphs.

perf – System‑level profiling when pprof fails, showing symbol‑level hotspots.

eBPF – Dynamic, non‑intrusive tracing for kernel‑level insights; useful when Go‑level tools are insufficient.

Example Go code used with eBPF:

package main

import "fmt"

func main() {
    fmt.Println("Hello, BPF!")
}
# funclatency 'go:fmt.Println'
Tracing 1 functions for "go:fmt.Println"... Hit Ctrl-C to end.
^C
Function = fmt.Println [3041]
     nsecs               : count     distribution
          0 -> 1  : 0        |                           |
          2 -> 3  : 0        |                           |
          4 -> 7  : 0        |                           |
          8 -> 15 : 0        |                           |
        16 -> 31 : 0        |                           |
        32 -> 63 : 0        |                           |
        64 -> 127 : 0        |                           |
        128 -> 255 : 0        |                           |
        256 -> 511 : 0        |                           |
        512 -> 1023 : 0        |                           |
       1024 -> 2047 : 0        |                           |
       2048 -> 4095 : 0        |                           |
       4096 -> 8191 : 0        |                           |
       8192 -> 16383 : 27   |****************************************|
      16384 -> 32767 : 3    |****                        |
Detaching...

Optimization Strategies :

Application Layer : Pool resources (sync.Pool), tighten lock scopes, replace heavy JSON libraries, follow fasthttp best practices.

System Layer : Upgrade Go version, tune OS parameters (swap, NUMA), refer to Red Hat tuning guides.

Continuous Profiling : Periodic pprof collection (cron) with archiving enables time‑range analysis and diffing. Tools like Conprof provide a UI for this workflow.

eBPF + Go : Leverage eBPF for low‑overhead tracing of function latency and call stacks when traditional profilers are unavailable.

Conclusion : Effective incident handling combines SOPs, robust tooling, and disciplined benchmarking. Continuous profiling and proactive code reviews further reduce incident frequency.

GoPerformance ProfilingpprofeBPFonline serviceSOP
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.