Cloud Native 20 min read

Mastering K8s Application Lifecycle: Health Checks, Graceful Shutdown, Metrics & Tracing

This article explains how developers and operators should prepare a Go‑based service for Kubernetes by implementing health‑check endpoints, graceful shutdown handling, metrics exposure, tracing integration, standardized logging, and operational best practices such as stateless design, high availability, self‑healing, and HTTPS configuration.

Ops Development Stories

May 18, 2022

Mastering K8s Application Lifecycle: Health Checks, Graceful Shutdown, Metrics & Tracing

In the whole lifecycle of an application, development and operations are inseparable. When deploying to Kubernetes, both sides have responsibilities.

Development Side

From the development perspective, the application should provide the following capabilities:

Health check endpoint

Graceful shutdown

Metrics endpoint

Trace integration

Standardized log output

Define health check endpoint

The health check endpoint is used by Kubernetes readiness and liveness probes to determine if the pod is ready or alive. If not defined, Kubernetes cannot assess the application’s health.

Example implementation:

package router

import (
    "github.com/gin-gonic/gin"
    v1 "go-hello-world/app/http/controllers/v1"
)

func SetupRouter(router *gin.Engine) {
    ruc := new(v1.RootController)
    router.GET("/", ruc.Root)

    huc := new(v1.HealthController)
    router.GET("/health", huc.HealthCheck)
}

package v1

import (
    "github.com/gin-gonic/gin"
    "go-hello-world/app/http/controllers"
    "go-hello-world/pkg/response"
    "net/http"
)

type HealthController struct {
    controllers.BaseController
}

func (h *HealthController) HealthCheck(c *gin.Context) {
    response.WriteResponse(c, http.StatusOK, nil, gin.H{"result": "健康检测页面", "status": "OK"})
}

When the application starts, Kubernetes probes this endpoint; a successful response indicates the app is healthy. In real scenarios, the health check may also need to verify dependent services such as Redis, MySQL, MQ, etc.

The corresponding YAML snippet adds readinessProbe and livenessProbe:

readinessProbe:
  httpGet:
    path: /health
    port: http
  timeoutSeconds: 3
  initialDelaySeconds: 20
livenessProbe:
  httpGet:
    path: /health
    port: http
  timeoutSeconds: 3
  initialDelaySeconds: 30

Define graceful shutdown

During a rolling update, the old pod must finish processing in‑flight requests before termination. Kubernetes sends a SIGTERM to the pod, waits for a grace period, then SIGKILL. The application should handle these signals.

Example shutdown library:

package shutdown

import (
    "context"
    "fmt"
    "net/http"
    "os"
    "os/signal"
    "time"
)

type Shutdown struct {
    ch      chan os.Signal
    timeout time.Duration
}

func New(t time.Duration) *Shutdown {
    return &Shutdown{ch: make(chan os.Signal), timeout: t}
}

func (s *Shutdown) Add(signals ...os.Signal) { signal.Notify(s.ch, signals...) }

func (s *Shutdown) Start(server *http.Server) {
    <-s.ch
    fmt.Println("start exit......")
    ctx, cancel := context.WithTimeout(context.Background(), s.timeout*time.Second)
    defer cancel()
    if err := server.Shutdown(ctx); err != nil {
        fmt.Println("Graceful exit failed. err:", err)
    }
    fmt.Println("Graceful exit success.")
}

The main program registers the shutdown handler and starts the HTTP server, then calls shutdown.Start(server) after adding SIGINT and SIGTERM.

package main

import (
    "github.com/gin-gonic/gin"
    "go-hello-world/pkg/shutdown"
    "go-hello-world/router"
    "log"
    "net/http"
    "syscall"
    "time"
)

func main() {
    r := gin.New()
    router.SetupRouter(r)

    server := &http.Server{Addr: ":8080", Handler: r}

    go func() {
        if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("server.ListenAndServe err: %v", err)
        }
    }()

    quit := shutdown.New(10)
    quit.Add(syscall.SIGINT, syscall.SIGTERM)
    quit.Start(server)
}

Kubernetes also supports a PreStop hook; for example, a sleep command or a call to a service registry such as Nacos before the pod is removed.

lifecycle:
  preStop:
    exec:
      command:
        - /bin/sh
        - -c
        - sleep 30

Define metrics endpoint

Metrics expose application statistics for Prometheus. The example uses the Prometheus client library to expose default HTTP metrics and custom counters/histograms.

package metrics

import (
    "github.com/prometheus/client_golang/prometheus"
    "net/http"
    "time"
)

var (
    HttpserverRequestTotal = prometheus.NewCounterVec(prometheus.CounterOpts{Name: "httpserver_request_total", Help: "The Total number of httpserver requests"}, []string{"method", "endpoint"})
    HttpserverRequestDuration = prometheus.NewHistogramVec(prometheus.HistogramOpts{Name: "httpserver_request_duration_seconds", Help: "httpserver request duration distribution", Buckets: []float64{0.1,0.3,0.5,0.7,0.9,1}}, []string{"method", "endpoint"})
)

func init() {
    prometheus.MustRegister(HttpserverRequestTotal)
    prometheus.MustRegister(HttpserverRequestDuration)
}

func NewMetrics(handler http.HandlerFunc) http.HandlerFunc {
    return func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        handler(w, r)
        duration := time.Since(start)
        HttpserverRequestTotal.With(prometheus.Labels{"method": r.Method, "endpoint": r.URL.Path}).Inc()
        HttpserverRequestDuration.With(prometheus.Labels{"method": r.Method, "endpoint": r.URL.Path}).Observe(duration.Seconds())
    }
}

The service’s Deployment template adds the Prometheus annotations so that the metrics endpoint is scraped automatically.

metadata:
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "metrics"

Define tracing

Tracing assigns a TraceID to each request, enabling end‑to‑end request tracking. Open‑source tracing systems include Jaeger, Zipkin, SkyWalking, etc. The article chooses SkyWalking for Go, showing a minimal integration.

package main

import (
    "github.com/SkyAPM/go2sky"
    v3 "github.com/SkyAPM/go2sky-plugins/gin/v3"
    "github.com/SkyAPM/go2sky/reporter"
    "github.com/gin-gonic/gin"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "go-hello-world/pkg/shutdown"
    "go-hello-world/router"
    "log"
    "net/http"
    "syscall"
    "time"
)

var SKYWALKING_ENABLED = false

func main() {
    r := gin.New()
    if SKYWALKING_ENABLED {
        rp, err := reporter.NewGRPCReporter("skywalking-oap:11800", reporter.WithCheckInterval(time.Second))
        if err != nil {
            log.Printf("create gosky reporter failed. err: %s", err)
        }
        defer rp.Close()
        tracer, _ := go2sky.NewTracer("go-hello-world", go2sky.WithReporter(rp))
        r.Use(v3.Middleware(r, tracer))
    }
    router.SetupRouter(r)

    server := &http.Server{Addr: ":8080", Handler: r}

    go func() {
        http.Handle("/metrics", promhttp.Handler())
        if err := http.ListenAndServe(":9527", nil); err != nil {
            log.Printf("metrics port listen failed. err: %s", err)
        }
    }()

    go func() {
        if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
            log.Fatalf("server.ListenAndServe err: %v", err)
        }
    }()

    quit := shutdown.New(10)
    quit.Add(syscall.SIGINT, syscall.SIGTERM)
    quit.Start(server)
}

Standardized logging

Consistent log output (preferably to stdout) simplifies collection and troubleshooting. In Kubernetes, logs should not be written to files because they are transient and may be lost during redeployments.

Operations Side

After development, operations deploy the application. To ensure stability, the following points should be considered:

Keep the application stateless

Ensure high availability

Provide graceful rollout capability

Support self‑healing

Expose HTTPS

Stateless design

Prefer stateless services; persist data in databases or external storage rather than inside the pod.

High availability

Deploy multiple replicas, use pod anti‑affinity, configure PodDisruptionBudget, and set appropriate QoS resources.

spec:
  replicas: 2
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values: ["httpserver"]
        topologyKey: kubernetes.io/hostname
  podDisruptionBudget:
    minAvailable: 1
    selector:
      matchLabels:
        app: httpserver
  resources:
    limits:
      cpu: "1"
      memory: 2Gi
    requests:
      cpu: "1"
      memory: 2Gi

Graceful rollout

Kubernetes adds a pod to the service only after the readiness probe succeeds, ensuring that traffic is sent to a fully started instance.

Self‑healing

Liveness probes detect crashed or unhealthy pods and trigger restarts; node failures cause pod rescheduling.

HTTPS access

Create a TLS secret and reference it in an Ingress resource to expose the service over HTTPS.

# Create TLS secret
kubectl create secret tls httpserver-tls-secret --cert=path/to/tls.cert --key=path/to/tls.key

# Ingress snippet
spec:
  tls:
  - hosts:
    - httpserver.coolops.cn
    secretName: httpserver-tls-secret
  rules:
  - host: httpserver.coolops.cn
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: httpserver
            port:
              number: 8080

Conclusion

The article outlines essential development and operational practices for deploying a Go‑based HTTP service on Kubernetes, covering health checks, graceful shutdown, metrics, tracing, logging, stateless design, high availability, graceful rollout, self‑healing, and HTTPS exposure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes DevOps tracing Graceful Shutdown health check

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.