Why Nacos Client 1.4.1 Crash Causes Full Service Outage and How to Fix It

This article details a real‑world incident where a brief DNS resolution failure triggered a fatal bug in Nacos client 1.4.1, causing heart‑beat loss and complete service shutdown, and explains the root cause, code analysis, and recommended upgrades.

Programmer DD
Programmer DD
Programmer DD
Why Nacos Client 1.4.1 Crash Causes Full Service Outage and How to Fix It

Problem Discovery

Earlier this week I received a report from an MSE Nacos user that the online Nacos service was unavailable and all services were offline. The logs showed numerous errors, leading me to initially suspect a platform outage.

The user‑provided error log is shown below:

The business logs also contained many "service address not found" errors, indicating that the Nacos service had gone down.

Checking the server side monitoring revealed no anomalies; CPU, memory, and other metrics were normal, so the issue was not on the server.

Turning attention to the client, the stack trace suggested a DNS resolution problem. The error persisted for about ten minutes, yet ping, dig, and telnet to the Nacos host all succeeded, indicating that the DNS issue was transient on the client machine.

Further investigation showed that only machines that had been restarted recovered, while others continued to fail, deepening the mystery.

A second user reported the same issue, also seeing DNS errors for Redis, confirming that the root cause was a DNS resolution failure affecting multiple services.

Both users were using nacos-client 1.4.1, which led us to focus on that version.

Nacos 1.4.1 Introduced Bug

At the time of the incident, the latest Nacos 1.x version was 1.4.2. Checking the source of 1.4.1 around line 595 revealed the problematic code.

The method splitIPPortStr extracts the address from the Nacos connection string and adds the default port 8848 if missing. The bug lies in the IsIPv4 check.

The code calls InetAddress.getByName(addr), which simply resolves a hostname to its IP addresses. The original implementation used a custom IPv4 pattern check, which could throw an IllegalArgumentException when the DNS lookup fails.

Given the name of a host, returns an array of its IP addresses, based on the configured name service on the system.

In version 1.4.2 this logic was corrected to use a regular‑expression match for IPv4, eliminating the bug.

However, the deeper issue remained: when DNS resolution temporarily failed, the thrown IllegalArgumentException was not converted into a NacosException. Consequently, the heartbeat thread in com.alibaba.nacos.client.naming.beat.BeatReactor did not catch the exception, causing the heartbeat to stop permanently.

This explains why a brief DNS outage could lead to a full service shutdown that does not recover even after the network is back.

Improvement Suggestions

Replace the isIPv6 and isIPv4 checks with regular‑expression matching (already fixed in 1.4.2).

Ensure the heartbeat thread catches all exceptions so that a single failure does not halt subsequent heartbeats.

The second suggestion has also been addressed in later releases.

Conclusion

The Nacos client 1.4.1 contains a serious bug: a short DNS resolution failure causes the heartbeat to be lost permanently, leading to a complete service outage even after the network recovers.

DNS failures are common in network jitter or when CoreDNS times out in Kubernetes environments. To avoid this impact, verify the Nacos client version used by your applications and upgrade to at least 1.4.2.

The issue is limited to version 1.4.1; versions below it are unaffected. Users of Spring Cloud or Dubbo should explicitly set the Nacos client version, as Dubbo 2.7.11 defaults to 1.4.1. Upgrading to 1.4.2 is strongly recommended.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

service discoveryNacosVersion UpgradeDNS Resolutionclient bugheartbeat loss
Programmer DD
Written by

Programmer DD

A tinkering programmer and author of "Spring Cloud Microservices in Action"

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.