How We Tuned Nacos Config Center to Eliminate Timeouts and QPS Limits
This article explains how Nacos, an open‑source dynamic naming and configuration service, was used in a micro‑service project, the two performance problems encountered—configuration fetch timeouts and server‑side QPS throttling—and the step‑by‑step optimizations (memory caching, fallback values, pre‑fetching and listener registration, and limit adjustments) that resolved them.
Overview of Nacos
Nacos (Dynamic Naming and Configuration Service) is an open‑source platform for service discovery, configuration management, and service governance, originally developed by Alibaba Group. It supports dynamic updates, multi‑environment namespaces, various configuration formats (Properties, YAML, JSON), and traffic control, making it a popular choice for micro‑service configuration.
Real‑World Application in Our Project
In our system Nacos is primarily used as a configuration center for tasks such as managing black‑/white‑list changes, gray releases, and feature toggles. During a data‑center migration (from a legacy data center to Kingsoft Cloud), we leveraged Nacos to switch Redis endpoints via a configuration flag, allowing a seamless cut‑over without service disruption.
Problem 1 – Configuration Fetch Timeout
When request volume was high, the client‑side getConfig call often timed out (default 400 ms) because each request fetched the configuration directly from the Nacos server without local caching. The relevant code was:
public static String getConfig(String dataId) {
try {
return configService.getConfig(dataId, NACOS_GROUP, 400);
} catch (Throwable e) {
LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
}
return null;
}To solve this we introduced an in‑memory cache and a fallback value:
Register a listener for each configuration item; when a change occurs, store the new value in a Map<String, String>.
When fetching a config, first check the cache; if absent, retrieve from the server, store it in the cache, and return it.
If the server call still times out, return a predefined fallback value.
Optimized code example:
public static String getConfig(String dataId) {
try {
// Prefer cache
String value = cacheMap.get(dataId);
if (StringUtils.isBlank(value)) {
value = configService.getConfig(dataId, NACOS_GROUP, 100);
cacheMap.put(dataId, value);
}
return value;
} catch (Throwable e) {
LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
}
return null;
}
public static boolean getSwitch() {
try {
String config = getConfig(SWITCH_DATAID);
if (StringUtils.isNotBlank(config)) {
return "1".equalsIgnoreCase(config);
}
} catch (Exception e) {
LOGGER.error("Error", e);
}
// Fallback value
return false;
}Problem 2 – Server‑Side QPS Throttling
During a gray‑release restart, Nacos returned error code -503 because the request QPS exceeded the server‑side limit (default 5 QPS per accessKeyId). The interceptor logged messages such as:
ERROR [traceId:...] - access_key_id:xxxxxx limitedInvestigation showed that the interceptor’s isLimit method returned true when the QPS threshold was breached, causing the request to be rejected.
Solution 1 – Increase Server Limit
We considered raising the
limitTime</) property (e.g., from 5 to 10) but recognized the risk of overloading the Nacos server and potential side effects, so this approach was deemed sub‑optimal.</p>
<h2>Solution 2 – Pre‑Fetch and Listener Registration</h2>
<p>After the first optimization, we modified the startup sequence to fetch all required configurations before traffic arrives and to register listeners that keep the cache up‑to‑date. Using <code>getConfigAndSignListenerwe combined fetching and listener registration in one call:
String config = configService.getConfigAndSignListener(dataId, group, 1000,
new PropertiesListener() {
@Override public void innerReceive(Properties properties) {}
@Override public void receiveConfigInfo(String configInfo) {
cacheMap.put(dataId, configInfo);
}
});
if (StringUtils.isNotEmpty(config)) {
cacheMap.put(dataId, config);
}Load testing in a UAT environment with QPS > 5 showed that the errors disappeared, confirming this as the preferred solution.
Final Recommendations
Fetch configurations and register listeners during service startup.
Cache fetched configurations in local memory for fast access.
Optionally increase the server‑side QPS limit, but do so cautiously.
Set a reasonable timeout for server fetches (e.g., 100 ms) and enable retries.
Implement comprehensive error logging and alerting.
Provide a fallback value for each configuration key.
These practices ensure reliable, low‑latency configuration retrieval in high‑traffic micro‑service environments.
Sohu Smart Platform Tech Team
The Sohu News app's technical sharing hub, offering deep tech analyses, the latest industry news, and fun developer anecdotes. Follow us to discover the team's daily joys.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
