Cloud Native 15 min read

How to Build a Highly Available Nacos + Higress Microservice Gateway on Kubernetes

This guide provides a production‑ready, step‑by‑step solution for deploying a high‑availability microservice gateway using Nacos as a service‑registry and configuration center together with Higress as a cloud‑native gateway on Kubernetes, covering architecture, prerequisites, Helm commands, key values.yaml examples, observability, security, backup, upgrade, recovery runbooks, and common troubleshooting.

Ray's Galactic Tech
Ray's Galactic Tech
Ray's Galactic Tech
How to Build a Highly Available Nacos + Higress Microservice Gateway on Kubernetes

Overview

The document presents a practical, production‑grade deployment plan for a highly available microservice gateway that combines Nacos (service registration and configuration) with Higress (cloud‑native gateway). It details architecture design, required prerequisites, Helm‑based installation commands, essential values.yaml snippets, observability, security, backup, upgrade, recovery procedures, and troubleshooting.

Core Components

Higress : Cloud‑native gateway (control plane + data plane) that can discover services from multiple registries, including Nacos, and supports routing, WAF, mTLS, Wasm/Lua extensions, Prometheus metrics, and SkyWalking integration.

Nacos : Service discovery and configuration management. Production deployments should use a StatefulSet cluster (≥3 nodes) with an external MySQL database for persistence.

High‑Availability Architecture

The design recommends multiple replicas for both gateway (≥2‑3) and controller (≥2), and a Nacos StatefulSet with at least three nodes backed by PVCs and an external MySQL instance. The diagram (kept as an image) illustrates the integration.

Prerequisites

Kubernetes cluster (v1.24+), preferably multi‑AZ.

Helm 3 CLI.

Persistent storage supporting PVCs.

Optional external MySQL HA (primary‑primary or primary‑replica).

Deployment Steps

4.1 Deploy Nacos

Use audited Helm charts; replace secret references, storageClass, and DB addresses before production.
# Add and update Helm repo (example)
helm repo add nacos https://charts.example.com/nacos
helm repo update

# Install with your own values
helm install nacos nacos/nacos -n nacos --create-namespace -f nacos-values.yaml

Example nacos-values.yaml:

replicaCount: 3
persistence:
  enabled: true
  storageClass: "gp2"    # replace per environment
  size: 20Gi

externalDatabase:
  enabled: true
  host: my-mysql-primary.my-db.svc.cluster.local
  port: 3306
  user: nacos
  passwordSecret: nacos-mysql-secret  # stored in a k8s Secret
  dbName: nacos_config

# readiness/liveness probes (example)
readinessProbe:
  httpGet:
    path: /nacos/v1/console/health
    port: 8848
  initialDelaySeconds: 30
  periodSeconds: 10

Deploy Nacos as a StatefulSet with ≥3 replicas and connect to external MySQL. Use PVCs for data persistence.

4.2 Deploy Higress

Use the official Higress Helm chart for a one‑click deployment of the control plane and gateway.
helm repo add higress https://higress.io/helm-charts
helm repo update
helm install higress higress/higress -n higress-system --create-namespace -f higress-values.yaml

Example higress-values.yaml:

controller:
  replicaCount: 2
  resources:
    requests:
      cpu: 500m
      memory: 512Mi

gateway:
  replicaCount: 3
  service:
    type: LoadBalancer   # or NodePort/ClusterIP + external SLB
  readinessProbe:
    httpGet:
      path: /healthz
      port: 8080
    initialDelaySeconds: 5
    periodSeconds: 10

# Enable Nacos service discovery

discovery:
  nacos:
    enabled: true
    serverAddr: "nacos.nacos.svc.cluster.local:8848"
    namespace: "public"
    # type: "nacos2" # for Nacos v2 gRPC mode

# Store Higress configuration in Nacos (optional)
config:
  backend: nacos
  nacos:
    serverAddr: "nacos.nacos.svc.cluster.local:8848"
    namespace: "public"
    dataId: "higress"
    group: "DEFAULT_GROUP"

# SkyWalking integration (example)
observability:
  skywalking:
    enabled: true
    oapAddress: "skywalking.oap.svc.cluster.local:11800"

Gateway should be exposed via LoadBalancer in cloud environments or MetalLB/NodePort with an external SLB on bare metal.

Key Configuration Details

replicas : Gateway ≥3 (or ≥2 across nodes), Controller ≥2.

probes : Liveness and readiness probes to avoid routing traffic to unhealthy pods.

service.type : LoadBalancer recommended for cloud; MetalLB or NodePort + external SLB for on‑prem.

persistence : Nacos PVC + external MySQL for data reliability.

configuration source : Higress can pull routing and policy definitions from Nacos for hot‑reload, or manage them via CRDs.

Observability, Tracing, and Logging

Prometheus : Scrape Higress metrics (Envoy + controller).

Tracing : Integrate SkyWalking (preferred), or Zipkin/Jaeger.

Logging : Collect Envoy access logs and controller pod logs into EFK/ELK stacks.

Alerting : Set alerts for request latency, error rates, Nacos node count, and DB connection issues.

Security Practices

External TLS for inbound traffic (managed via cert‑manager).

mTLS between gateway and backend services.

Never store DB passwords or certificates in plain‑text values.yaml; use Kubernetes Secrets, External Secrets, or Vault.

NetworkPolicy/RBAC to restrict pod‑to‑pod communication, allowing only necessary access between Higress, Nacos, SkyWalking, and the database.

Backup, Upgrade, and Recovery Runbook

Backup

MySQL: Regular full‑dump backups with verification.

Nacos configuration export via API scripts.

Store Higress CRDs and routing definitions in Git (GitOps).

Upgrade

Perform a full upgrade rehearsal in a staging environment (Helm upgrade, controller first, then gateway).

Roll the Nacos StatefulSet upgrade ensuring Raft leader stability and quorum.

Promote to production only after monitoring shows no anomalies.

Recovery (example: Nacos primary DB failure)

Stop write traffic to Nacos (temporarily gray‑out the gateway).

Restore the latest MySQL backup to a new or original instance.

Restart Nacos nodes and verify cluster health.

Resume traffic and monitor closely.

Common Issues and Quick Troubleshooting

Nacos pods restart frequently : Check PVC health, disk I/O, MySQL connectivity, and JVM OOM.

Higress cannot discover services : Verify network connectivity to Nacos, correct serverAddr, namespace, and dataId settings.

Gateway 5xx or high latency : Inspect Envoy access logs, backend health, timeout, and connection‑pool parameters.

Configuration changes not taking effect : Ensure changes are written to Nacos (if using Nacos backend) or to the appropriate CRDs.

Operational Automation and Best Practices

GitOps (ArgoCD/Flux) for CRDs, routing definitions, and values.yaml files.

External Secrets or HashiCorp Vault for secret management.

Canary releases using Higress route weights, headers, or cookies.

Regular disaster‑recovery drills (MySQL restore, Higress upgrade rollback).

Prefer Nacos v2 gRPC mode when possible to reduce resource consumption and speed up change detection.

Minimal Dev‑Environment Demo YAML (for quick smoke‑test)

# nacos-demo-pvc.yaml (dev only)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nacos-data
  namespace: nacos
spec:
  accessModes: ["ReadWriteOnce"]
  storageClassName: "standard"
  resources:
    requests:
      storage: 5Gi
---
# nacos-demo-statefulset.yaml (dev only)
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: nacos
  namespace: nacos
spec:
  serviceName: "nacos"
  replicas: 1
  selector:
    matchLabels:
      app: nacos
  template:
    metadata:
      labels:
        app: nacos
    spec:
      containers:
      - name: nacos
        image: nacos/nacos-server:2.2.0
        ports:
        - containerPort: 8848
        volumeMounts:
        - name: nacos-data
          mountPath: /home/nacos/data
  volumeClaimTemplates:
  - metadata:
      name: nacos-data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: "standard"
      resources:
        requests:
          storage: 5Gi
---
# higress-demo-deployment.yaml (dev only)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: higress-controller
  namespace: higress-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: higress-controller
  template:
    metadata:
      labels:
        app: higress-controller
    spec:
      containers:
      - name: controller
        image: higress/controller:latest
        ports:
        - containerPort: 8080

FAQ

Q: Does Higress support Nacos? A: Yes. Higress can use Nacos as a service‑discovery source and configuration backend, and it also supports Nacos v2 gRPC mode for faster change detection.

Q: How to achieve high availability for Nacos on K8s? A: Deploy Nacos as a StatefulSet with at least three replicas and use an external MySQL database for persistent metadata.

Q: How to ensure hot configuration updates? A: Store Higress routing and plugin configurations in Nacos (or manage them via CRDs); Higress watches Nacos for changes and applies them without restart.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KubernetesNacosgatewayhelmHigress
Ray's Galactic Tech
Written by

Ray's Galactic Tech

Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.