How a Single Nacos Setting Crashed Our Payment Service—and What It Teaches About Instance Types
A mis‑configured Nacos registration flag turned a payment service into a persistent instance, preventing unhealthy nodes from being removed and causing the entire payment chain to fail; the article explains the fundamental differences between registration and configuration centers and when to use temporary versus persistent instances.
Incident Overview
During a pre‑holiday release, a gray‑scale deployment caused some users to see orders stuck after payment. Investigation revealed that the payment-service registration in Nacos had been changed to ephemeral=false, making the instance persistent.
One node suffered a memory‑leak‑induced GC pause that blocked its heartbeat for over 30 seconds. Because the instance was persistent, Nacos did not evict it, so callers continued to route requests to the unhealthy node, ultimately collapsing the entire payment flow.
Registration Center vs. Configuration Center
Nacos provides two distinct services:
Service Registration Center : designed for high‑availability (AP) service discovery; it tolerates brief inconsistencies but requires strong consistency (CP) for configuration data.
Configuration Center : stores static configuration files; updates must be synchronized to all nodes.
In simple terms, registration instances represent live service nodes, while configuration instances are static config files.
Default Temporary Instances (Ephemeral)
Temporary instances are the default mode for Nacos service registration. Spring Cloud, Dubbo, and similar frameworks register services as temporary unless configured otherwise.
Heartbeat: clients send a heartbeat every 5 seconds; if the server misses 15 seconds it marks the instance unhealthy, and after 30 seconds it removes the instance.
Storage: instance metadata lives only in server memory and disappears on Nacos restart.
Failure behavior: when a node crashes or its heartbeat is blocked (as in the payment service), Nacos automatically removes the instance, preventing routing to a dead node.
Persistent Instances
Persistent instances target long‑running, rarely‑changed infrastructure services such as MySQL, Redis, or Elasticsearch. Their health‑check mechanism is active probing by the Nacos server (TCP or HTTP) rather than client heartbeats.
Health probing: Nacos periodically probes the service port or health endpoint.
Storage: instance information is persisted to Nacos’s database (Derby by default, MySQL in production), so it survives server restarts.
Failure behavior: an unhealthy node is marked as such but not removed, allowing operators to see and fix the issue.
Switching Instance Types in Spring Cloud
In a Spring Cloud project, the instance type is controlled by a single line in application.yml. The problematic configuration was:
spring:
cloud:
nacos:
discovery:
server-addr: 192.168.1.100:8848
ephemeral: false # mistakenly set to false; should be true (default)
service: payment-service # service nameChanging ephemeral back to true restores the default temporary behavior and resolves the issue.
Key Takeaways
Use temporary (ephemeral) instances for dynamic business services such as payment or order processing.
Use persistent instances for stable infrastructure components like databases or caches.
The configuration center always stores persistent configurations; there is no “temporary configuration” concept.
Understanding these distinctions prevents similar incidents and ensures that service discovery behaves as expected.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer XiaoFu
xiaofucode.com – a programmer learning guide driven by the pursuit of profit
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
