Why Misconfiguring Nacos Ephemeral Settings Can Crash Your Payment Service
A misconfigured Nacos registration type turned a temporary service instance into a persistent one, causing heartbeat blockage and a cascading failure in the payment chain, illustrating when to use temporary versus persistent instances in service registries and configuration centers.
Hello everyone, I'm Su San.
Before the holiday release, an issue occurred: after a gray release, some users reported that order status didn't update after payment, and the failure rate of the payment service skyrocketed.
Investigation revealed a fatal configuration error: during deployment, the payment-service was changed to use a persistent Nacos registration ( ephemeral=false).
One service node suffered a memory leak, causing GC pauses over 30 seconds; because the instance was persistent, Nacos didn't evict it, so callers kept sending requests to the faulty node, eventually collapsing the entire payment chain.
Fundamental Difference Between Service Registry and Configuration Center
We use Nacos for both service registry and configuration center, but they have different design goals: the service registry prioritizes high availability (AP) for service discovery, tolerating brief inconsistencies, while the configuration center requires consistency (CP), ensuring configurations are never lost and updates are synchronized.
In short, a registry instance is a live service node, whereas a configuration instance is a static configuration file.
Service Registry: Default Temporary Instances
The core requirement of a service registry is real‑time awareness of service availability.
Nacos provides two modes: temporary instances and persistent instances, matching dynamic services and static services respectively.
Temporary Instance
Temporary instances are Nacos's default mode.
When Spring Cloud, Dubbo, etc., start, they register as temporary instances unless configured otherwise. The heartbeat mechanism sends a ping every 5 seconds; if the server misses a heartbeat for 15 seconds it marks the instance unhealthy, and after 30 seconds it removes the instance.
Heartbeat: client sends a heartbeat every 5 seconds; server removes instance after 30 seconds of silence.
Storage: instance info lives only in server memory, not persisted to disk; a server restart clears all temporary instances.
Failure behavior: if a node crashes or its heartbeat is blocked (e.g., GC pause), the instance is automatically removed, preventing routing to an invalid node.
Persistent Instance
Persistent instances are the opposite: they target long‑running, rarely changing foundational services such as MySQL, Redis, Elasticsearch. The server actively probes health (TCP port, HTTP endpoint, or custom protocol) and persists instance data to Nacos's database.
Health check: server actively probes (e.g., MySQL 3306, Redis /health) instead of relying on client heartbeats.
Storage: instance information is persisted to the Nacos database (Derby by default, MySQL in production), surviving server restarts.
Failure behavior: when a node becomes unhealthy, Nacos marks it as unhealthy but does not delete it, allowing operators to see the faulty node in the console and recover it.
In a Spring Cloud project, switching the instance type only requires adding one line to application.yml. The following snippet shows the erroneous configuration that caused the outage:
spring:
cloud:
nacos:
discovery:
server-addr: 192.168.1.100:8848
ephemeral: false # should be true (default)
service: payment-service # registered service nameConfiguration Center: Default Persistent
All configuration instances in Nacos's configuration center are persistent; there is no concept of temporary configuration. The center's purpose is centralized management to avoid loss, so every config is stored in a database (MySQL in production) and survives server restarts.
Storage: configs are persisted to the database, never lost on restart.
Lifecycle: configs are only removed or overwritten manually, not automatically when a client disconnects.
Dynamic updates: clients poll for changes (default every 30 seconds) and receive updates within a second, but this is about real‑time content change, not temporary existence.
Conclusion
Service Registry: use temporary instances for dynamic business services (payment, order) and persistent instances for static foundational components (MySQL, Redis).
Configuration Center: all configs are persistent; dynamic updates do not imply temporary existence.
Understanding these distinctions helps prevent incidents like the payment‑service failure described above.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
