How a Single Nacos Config Change Crashed an Online Payment System
A misconfigured Nacos registration setting (ephemeral=false) caused a memory‑leaking payment service node to stay in the service registry, leading to repeated requests to a dead instance and a cascade failure of the entire online payment flow.
During a pre‑holiday deployment, the payment service experienced a surge in failure rates because a junior developer changed the Nacos registration type for payment-service to ephemeral=false, turning the instance into a persistent one.
Fundamental Difference Between Service Registry and Configuration Center
Nacos provides two distinct functions. The service registry must be highly available (AP) and can tolerate brief inconsistencies, while the configuration center requires strong consistency (CP) so that configuration data is never lost or corrupted.
Service Registry instance : a live service node (e.g., user-service) that is automatically registered by the client.
Configuration Center instance : a static configuration file (e.g., redis-dev.yml) stored in the database.
Service Registry: Default Temporary Instances
Temporary (ephemeral) instances are the default mode. They rely on a heartbeat mechanism: the client sends a heartbeat every 5 seconds; if the server does not receive a heartbeat for 15 seconds the instance is marked unhealthy, and after 30 seconds it is removed from the registry. The instance information lives only in server memory, so a restart clears all temporary instances.
Failure Scenario
If a node suffers a long‑lasting GC pause (over 30 seconds), its heartbeat stops, the instance is automatically evicted, and callers stop routing traffic to the dead node.
Persistent Instances
Persistent instances are intended for long‑running, rarely changing infrastructure services such as MySQL, Redis, or Elasticsearch. Nacos actively probes the service (TCP or HTTP health checks) and persists instance metadata to its database (Derby by default, MySQL in production). Even after a restart, the instance remains.
Health‑check mechanism: server‑side active probing instead of client heartbeats.
Storage: instance data is written to the database, so it survives restarts.
Failure handling: the instance is marked unhealthy but not removed, allowing operators to see the faulty node in the console and recover it.
How the Misconfiguration Happened
In a Spring Cloud project, adding a single line to application.yml switches the instance type. The erroneous line was:
spring:
cloud:
nacos:
discovery:
server-addr: 192.168.1.100:8848
ephemeral: false # should be true (default)
service: payment-service # service nameSetting ephemeral: false turned the payment service into a persistent instance. Because one of the payment nodes was stuck in GC, its heartbeat was blocked for more than 30 seconds, but the persistent registration prevented Nacos from removing it. Calls continued to be routed to the unhealthy node, eventually overwhelming the entire payment chain.
Configuration Center: Always Persistent
All configuration files in Nacos are persisted; there is no concept of a temporary configuration. Configurations are stored in a database, survive server restarts, and are only removed manually. Dynamic updates are delivered via long‑polling (default every 30 seconds) but do not make the configuration temporary.
Key Takeaways
Use temporary (ephemeral) instances for dynamic business services such as payment or order processing.
Use persistent instances for static infrastructure components like MySQL or Redis.
Configuration data is always persistent; dynamic updates do not imply temporary existence.
Understanding these distinctions prevents simple configuration mistakes from causing large‑scale service outages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
