How We Optimized Nacos Config Center to Eliminate Timeouts and QPS Limits
This article explains Nacos's role as a dynamic service discovery and configuration platform, describes two real‑world performance problems encountered in production, and details the step‑by‑step code‑level optimizations—memory caching with fallback and pre‑fetching with listeners—that resolved timeout and rate‑limit issues.
Nacos Introduction
Nacos (Dynamic Naming and Configuration Service) is an open‑source platform for service discovery, configuration management, and service governance developed and maintained by Alibaba Group. It helps users discover, configure, and manage microservices with features such as dynamic configuration updates, namespace and group mechanisms for multi‑environment management, support for various configuration formats (Properties, YAML, JSON), and traffic control.
Application of Nacos Config Center in Real Projects
In our projects Nacos is primarily used as a configuration center for managing blacklist/whitelist changes, gray releases, feature switches, etc. During the 2023 migration from a data center to Kingsoft Cloud, we leveraged Nacos to switch Redis instances across services by toggling a shared configuration key, enabling a seamless cut‑over without service disruption.
NacosClient Config Retrieval Process
Before optimization the client fetched configuration directly from the Nacos server on each request, which caused timeouts under high load.
Two Optimization Experiences
3.1 First Optimization: Memory Cache + Fallback
We observed that fetching configuration from the server took close to or exceeded the 400 ms timeout, leading to frequent timeout errors. Since configuration items are few and rarely change, we introduced an in‑memory cache and a fallback default value.
public static String getConfig(String dataId) {
try {
return configService.getConfig(dataId, NACOS_GROUP, 400);
} catch (Throwable e) {
LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
}
return null;
}After adding a listener that updates a cacheMap whenever a configuration changes, the client first checks the cache and only contacts the server if the entry is missing.
public static String getConfig(String dataId) {
try {
String value = cacheMap.get(dataId);
if (StringUtils.isBlank(value)) {
value = configService.getConfig(dataId, NACOS_GROUP, 100);
cacheMap.put(dataId, value);
}
return value;
} catch (Throwable e) {
LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
}
return null;
}If the cache miss also times out, a hard‑coded fallback value is returned.
public static boolean getSwitch() {
try {
String config = getConfig(SWITCH_DATAID);
if (StringUtils.isNotBlank(config)) {
return "1".equalsIgnoreCase(config);
}
} catch (Exception e) {
LOGGER.error("Error", e);
}
return false; // fallback
}3.2 Second Optimization: Pre‑fetch Config and Register Listener
After the first fix, we still observed occasional errors when the service started and a burst of traffic hit Nacos, causing the server‑side interceptor to reject requests (QPS limit of 5). To avoid this, we fetch the configuration once during application initialization and then register a listener for future changes.
public static void addListener(String dataId, String group) {
try {
configService.addListener(dataId, group, new PropertiesListener() {
@Override public void innerReceive(Properties properties) {}
@Override public void receiveConfigInfo(String configInfo) {}
});
} catch (Throwable e) {
LOGGER.error("addListener happens error, dataId = {}, group = {} ", dataId, group, e);
}
}Using getConfigAndSignListener we can fetch the config first and then attach the listener, storing the result in the cache.
String config = configService.getConfigAndSignListener(dataId, group, 1000,
new PropertiesListener() {
@Override public void innerReceive(Properties properties) {}
@Override public void receiveConfigInfo(String configInfo) {
cacheMap.put(dataId, configInfo);
}
});
if (StringUtils.isNotEmpty(config)) {
cacheMap.put(dataId, config);
}Testing in a UAT environment with QPS > 5 showed no further errors, confirming the effectiveness of this approach.
Summary
Nacos plays a crucial role as a configuration center in our microservice ecosystem. We encountered two major issues: request timeouts caused by direct server fetches and QPS throttling by the Nacos interceptor. By introducing an in‑memory cache with fallback and by pre‑fetching configuration before traffic arrives (combined with listeners), we eliminated both problems. The recommended best practices are:
Fetch configuration and register listeners at service startup.
Cache retrieved configurations in local memory.
Adjust the server‑side rate‑limit only if necessary, but avoid large changes.
Increase the configuration fetch timeout (e.g., to 100 ms) and enable retries.
Log errors and set up alerts.
Provide a fallback value for each configuration key.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
