Cloud Native 16 min read

How We Tuned Nacos Config Center to Eliminate Timeouts and QPS Limits

This article explains how Nacos, an open‑source dynamic naming and configuration service, was used in a micro‑service project, the two performance problems encountered—configuration fetch timeouts and server‑side QPS throttling—and the step‑by‑step optimizations (memory caching, fallback values, pre‑fetching and listener registration, and limit adjustments) that resolved them.

Sohu Smart Platform Tech Team
Sohu Smart Platform Tech Team
Sohu Smart Platform Tech Team
How We Tuned Nacos Config Center to Eliminate Timeouts and QPS Limits

Overview of Nacos

Nacos (Dynamic Naming and Configuration Service) is an open‑source platform for service discovery, configuration management, and service governance, originally developed by Alibaba Group. It supports dynamic updates, multi‑environment namespaces, various configuration formats (Properties, YAML, JSON), and traffic control, making it a popular choice for micro‑service configuration.

Real‑World Application in Our Project

In our system Nacos is primarily used as a configuration center for tasks such as managing black‑/white‑list changes, gray releases, and feature toggles. During a data‑center migration (from a legacy data center to Kingsoft Cloud), we leveraged Nacos to switch Redis endpoints via a configuration flag, allowing a seamless cut‑over without service disruption.

Problem 1 – Configuration Fetch Timeout

When request volume was high, the client‑side getConfig call often timed out (default 400 ms) because each request fetched the configuration directly from the Nacos server without local caching. The relevant code was:

public static String getConfig(String dataId) {
    try {
        return configService.getConfig(dataId, NACOS_GROUP, 400);
    } catch (Throwable e) {
        LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
    }
    return null;
}

To solve this we introduced an in‑memory cache and a fallback value:

Register a listener for each configuration item; when a change occurs, store the new value in a Map<String, String>.

When fetching a config, first check the cache; if absent, retrieve from the server, store it in the cache, and return it.

If the server call still times out, return a predefined fallback value.

Optimized code example:

public static String getConfig(String dataId) {
    try {
        // Prefer cache
        String value = cacheMap.get(dataId);
        if (StringUtils.isBlank(value)) {
            value = configService.getConfig(dataId, NACOS_GROUP, 100);
            cacheMap.put(dataId, value);
        }
        return value;
    } catch (Throwable e) {
        LOGGER.error("getConfig happens error, dataId = {}, group = {} ", dataId, NACOS_GROUP, e);
    }
    return null;
}

public static boolean getSwitch() {
    try {
        String config = getConfig(SWITCH_DATAID);
        if (StringUtils.isNotBlank(config)) {
            return "1".equalsIgnoreCase(config);
        }
    } catch (Exception e) {
        LOGGER.error("Error", e);
    }
    // Fallback value
    return false;
}

Problem 2 – Server‑Side QPS Throttling

During a gray‑release restart, Nacos returned error code -503 because the request QPS exceeded the server‑side limit (default 5 QPS per accessKeyId). The interceptor logged messages such as:

ERROR [traceId:...] - access_key_id:xxxxxx limited

Investigation showed that the interceptor’s isLimit method returned true when the QPS threshold was breached, causing the request to be rejected.

Solution 1 – Increase Server Limit

We considered raising the

limitTime</) property (e.g., from 5 to 10) but recognized the risk of overloading the Nacos server and potential side effects, so this approach was deemed sub‑optimal.</p>
<h2>Solution 2 – Pre‑Fetch and Listener Registration</h2>
<p>After the first optimization, we modified the startup sequence to fetch all required configurations before traffic arrives and to register listeners that keep the cache up‑to‑date. Using <code>getConfigAndSignListener

we combined fetching and listener registration in one call:

String config = configService.getConfigAndSignListener(dataId, group, 1000,
    new PropertiesListener() {
        @Override public void innerReceive(Properties properties) {}
        @Override public void receiveConfigInfo(String configInfo) {
            cacheMap.put(dataId, configInfo);
        }
    });
if (StringUtils.isNotEmpty(config)) {
    cacheMap.put(dataId, config);
}

Load testing in a UAT environment with QPS > 5 showed that the errors disappeared, confirming this as the preferred solution.

Final Recommendations

Fetch configurations and register listeners during service startup.

Cache fetched configurations in local memory for fast access.

Optionally increase the server‑side QPS limit, but do so cautiously.

Set a reasonable timeout for server fetches (e.g., 100 ms) and enable retries.

Implement comprehensive error logging and alerting.

Provide a fallback value for each configuration key.

These practices ensure reliable, low‑latency configuration retrieval in high‑traffic micro‑service environments.

Nacos QPS limit diagram
Nacos QPS limit diagram
Javacloud-nativePerformance optimizationmicroservicesNacos
Sohu Smart Platform Tech Team
Written by

Sohu Smart Platform Tech Team

The Sohu News app's technical sharing hub, offering deep tech analyses, the latest industry news, and fun developer anecdotes. Follow us to discover the team's daily joys.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.