How to Build a Highly Available Nacos + Higress Microservice Gateway on Kubernetes
This guide provides a production‑ready, step‑by‑step solution for deploying a high‑availability microservice gateway using Nacos as a service‑registry and configuration center together with Higress as a cloud‑native gateway on Kubernetes, covering architecture, prerequisites, Helm commands, key values.yaml examples, observability, security, backup, upgrade, recovery runbooks, and common troubleshooting.
Overview
The document presents a practical, production‑grade deployment plan for a highly available microservice gateway that combines Nacos (service registration and configuration) with Higress (cloud‑native gateway). It details architecture design, required prerequisites, Helm‑based installation commands, essential values.yaml snippets, observability, security, backup, upgrade, recovery procedures, and troubleshooting.
Core Components
Higress : Cloud‑native gateway (control plane + data plane) that can discover services from multiple registries, including Nacos, and supports routing, WAF, mTLS, Wasm/Lua extensions, Prometheus metrics, and SkyWalking integration.
Nacos : Service discovery and configuration management. Production deployments should use a StatefulSet cluster (≥3 nodes) with an external MySQL database for persistence.
High‑Availability Architecture
The design recommends multiple replicas for both gateway (≥2‑3) and controller (≥2), and a Nacos StatefulSet with at least three nodes backed by PVCs and an external MySQL instance. The diagram (kept as an image) illustrates the integration.
Prerequisites
Kubernetes cluster (v1.24+), preferably multi‑AZ.
Helm 3 CLI.
Persistent storage supporting PVCs.
Optional external MySQL HA (primary‑primary or primary‑replica).
Deployment Steps
4.1 Deploy Nacos
Use audited Helm charts; replace secret references, storageClass, and DB addresses before production.
# Add and update Helm repo (example)
helm repo add nacos https://charts.example.com/nacos
helm repo update
# Install with your own values
helm install nacos nacos/nacos -n nacos --create-namespace -f nacos-values.yamlExample nacos-values.yaml:
replicaCount: 3
persistence:
enabled: true
storageClass: "gp2" # replace per environment
size: 20Gi
externalDatabase:
enabled: true
host: my-mysql-primary.my-db.svc.cluster.local
port: 3306
user: nacos
passwordSecret: nacos-mysql-secret # stored in a k8s Secret
dbName: nacos_config
# readiness/liveness probes (example)
readinessProbe:
httpGet:
path: /nacos/v1/console/health
port: 8848
initialDelaySeconds: 30
periodSeconds: 10Deploy Nacos as a StatefulSet with ≥3 replicas and connect to external MySQL. Use PVCs for data persistence.
4.2 Deploy Higress
Use the official Higress Helm chart for a one‑click deployment of the control plane and gateway.
helm repo add higress https://higress.io/helm-charts
helm repo update
helm install higress higress/higress -n higress-system --create-namespace -f higress-values.yamlExample higress-values.yaml:
controller:
replicaCount: 2
resources:
requests:
cpu: 500m
memory: 512Mi
gateway:
replicaCount: 3
service:
type: LoadBalancer # or NodePort/ClusterIP + external SLB
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
# Enable Nacos service discovery
discovery:
nacos:
enabled: true
serverAddr: "nacos.nacos.svc.cluster.local:8848"
namespace: "public"
# type: "nacos2" # for Nacos v2 gRPC mode
# Store Higress configuration in Nacos (optional)
config:
backend: nacos
nacos:
serverAddr: "nacos.nacos.svc.cluster.local:8848"
namespace: "public"
dataId: "higress"
group: "DEFAULT_GROUP"
# SkyWalking integration (example)
observability:
skywalking:
enabled: true
oapAddress: "skywalking.oap.svc.cluster.local:11800"Gateway should be exposed via LoadBalancer in cloud environments or MetalLB/NodePort with an external SLB on bare metal.
Key Configuration Details
replicas : Gateway ≥3 (or ≥2 across nodes), Controller ≥2.
probes : Liveness and readiness probes to avoid routing traffic to unhealthy pods.
service.type : LoadBalancer recommended for cloud; MetalLB or NodePort + external SLB for on‑prem.
persistence : Nacos PVC + external MySQL for data reliability.
configuration source : Higress can pull routing and policy definitions from Nacos for hot‑reload, or manage them via CRDs.
Observability, Tracing, and Logging
Prometheus : Scrape Higress metrics (Envoy + controller).
Tracing : Integrate SkyWalking (preferred), or Zipkin/Jaeger.
Logging : Collect Envoy access logs and controller pod logs into EFK/ELK stacks.
Alerting : Set alerts for request latency, error rates, Nacos node count, and DB connection issues.
Security Practices
External TLS for inbound traffic (managed via cert‑manager).
mTLS between gateway and backend services.
Never store DB passwords or certificates in plain‑text values.yaml; use Kubernetes Secrets, External Secrets, or Vault.
NetworkPolicy/RBAC to restrict pod‑to‑pod communication, allowing only necessary access between Higress, Nacos, SkyWalking, and the database.
Backup, Upgrade, and Recovery Runbook
Backup
MySQL: Regular full‑dump backups with verification.
Nacos configuration export via API scripts.
Store Higress CRDs and routing definitions in Git (GitOps).
Upgrade
Perform a full upgrade rehearsal in a staging environment (Helm upgrade, controller first, then gateway).
Roll the Nacos StatefulSet upgrade ensuring Raft leader stability and quorum.
Promote to production only after monitoring shows no anomalies.
Recovery (example: Nacos primary DB failure)
Stop write traffic to Nacos (temporarily gray‑out the gateway).
Restore the latest MySQL backup to a new or original instance.
Restart Nacos nodes and verify cluster health.
Resume traffic and monitor closely.
Common Issues and Quick Troubleshooting
Nacos pods restart frequently : Check PVC health, disk I/O, MySQL connectivity, and JVM OOM.
Higress cannot discover services : Verify network connectivity to Nacos, correct serverAddr, namespace, and dataId settings.
Gateway 5xx or high latency : Inspect Envoy access logs, backend health, timeout, and connection‑pool parameters.
Configuration changes not taking effect : Ensure changes are written to Nacos (if using Nacos backend) or to the appropriate CRDs.
Operational Automation and Best Practices
GitOps (ArgoCD/Flux) for CRDs, routing definitions, and values.yaml files.
External Secrets or HashiCorp Vault for secret management.
Canary releases using Higress route weights, headers, or cookies.
Regular disaster‑recovery drills (MySQL restore, Higress upgrade rollback).
Prefer Nacos v2 gRPC mode when possible to reduce resource consumption and speed up change detection.
Minimal Dev‑Environment Demo YAML (for quick smoke‑test)
# nacos-demo-pvc.yaml (dev only)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nacos-data
namespace: nacos
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 5Gi
---
# nacos-demo-statefulset.yaml (dev only)
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: nacos
namespace: nacos
spec:
serviceName: "nacos"
replicas: 1
selector:
matchLabels:
app: nacos
template:
metadata:
labels:
app: nacos
spec:
containers:
- name: nacos
image: nacos/nacos-server:2.2.0
ports:
- containerPort: 8848
volumeMounts:
- name: nacos-data
mountPath: /home/nacos/data
volumeClaimTemplates:
- metadata:
name: nacos-data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 5Gi
---
# higress-demo-deployment.yaml (dev only)
apiVersion: apps/v1
kind: Deployment
metadata:
name: higress-controller
namespace: higress-system
spec:
replicas: 1
selector:
matchLabels:
app: higress-controller
template:
metadata:
labels:
app: higress-controller
spec:
containers:
- name: controller
image: higress/controller:latest
ports:
- containerPort: 8080FAQ
Q: Does Higress support Nacos? A: Yes. Higress can use Nacos as a service‑discovery source and configuration backend, and it also supports Nacos v2 gRPC mode for faster change detection.
Q: How to achieve high availability for Nacos on K8s? A: Deploy Nacos as a StatefulSet with at least three replicas and use an external MySQL database for persistent metadata.
Q: How to ensure hot configuration updates? A: Store Higress routing and plugin configurations in Nacos (or manage them via CRDs); Higress watches Nacos for changes and applies them without restart.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
