Why Does Dubbo Keep Calling When Nacos Is Down? Uncovering the Cache Mechanism
This article analyzes a real‑world incident where Dubbo services continued to operate despite Nacos registry failures, explains the role of local provider caches, examines the namingLoadCacheAtStart configuration, and provides reproduction steps and best‑practice recommendations for high‑availability service discovery.
Problem Description
Last Thursday the author received a call about an online failure: a client using Dubbo with Nacos as the registry started reporting errors in the afternoon, and Nacos heartbeat requests returned 502.
2019-11-15 03:02:41.973 [com.alibaba.nacos.client.naming454] -ERROR [com.alibaba.nacos.naming.beat.sender] request xx.xx.xx.xx failed.
com.alibaba.nacos.api.exception.NacosException: failed to req API: xx.xx.xx.xx:8848/nacos/v1/ns/instance/beat. code:502Initially the errors were isolated, but after restarting some machines the failures expanded, showing many "no provider" errors.
Problem Analysis
Nacos heartbeat errors usually stem from either client network issues or the Nacos server itself being down.
Client machine problems, such as network disconnection.
Nacos server crash.
In this case the server failed due to an aging disk causing severe I/O slowdown, so it could not respond to client requests and returned 502.
Problem Reproduction
Dubbo version : 2.7.4
Nacos version : 1.1.4
Reproduction goal : Simulate Nacos server crash locally and observe Dubbo call behavior.
Start Nacos server, provider, and consumer, then trigger a consumer call.
Kill the Nacos server (kill -9) and trigger a consumer call.
Restart the consumer and trigger a call.
Expectation : All three steps succeed.
Actual result : Steps 1 and 2 succeed, step 3 fails.
Why does Dubbo still succeed after Nacos crashes?
Dubbo caches the provider list in memory, so it can continue to call providers directly without contacting the registry.
Why do logs still show call errors when Nacos is down?
During a registry outage, some provider nodes may go offline without the consumer being notified, causing calls to stale IPs to fail.
Dubbo performs channel‑level heartbeat detection and disconnects unavailable channels, restoring them when they become reachable again.
Alibaba Cloud’s EDAS offers an “outlier removal” feature to drop problematic nodes at the call layer.
Why does the consumer restart not succeed as expected?
Nacos provides a local cache file that can be used after a restart, but the client does not load this cache by default.
The namingLoadCacheAtStart parameter controls whether the cache file is loaded on startup; its default value is false.
Setting it to true enables the fallback, improving availability when the server is unavailable.
Passing registry parameters in Dubbo
Dubbo uses a unified URL model for configuration. To enable cache loading, the registry can be configured as:
<dubbo:registry address="nacos://127.0.0.1:8848?namingLoadCacheAtStart=true"/>However, current Dubbo versions only forward a subset of parameters to Nacos, so namingLoadCacheAtStart is ignored.
Modifying Dubbo 2.7.5‑SNAPSHOT to pass this parameter makes all three steps succeed, confirming the fix.
Problem Summary
The incident highlights how Nacos registry availability affects Dubbo applications and the need for fallback mechanisms such as local caching.
Dubbo only recognizes part of the registry parameters, causing some user configurations to be ineffective.
Cache‑loading switches like namingLoadCacheAtStart should be exposed as -D or environment variables for easier troubleshooting.
Enabling the cache fallback can significantly reduce the impact of registry outages.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Programmer DD
A tinkering programmer and author of "Spring Cloud Microservices in Action"
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
