How I Cut a 20‑Second API Call to Sub‑500 ms with Three Simple Optimizations

This article walks through a real‑world backend performance case, detailing how a batch‑score query API was trimmed from 20 seconds to under 500 milliseconds by adding composite indexes, introducing multithreaded execution, and limiting request batch size.

macrozheng
macrozheng
macrozheng
How I Cut a 20‑Second API Call to Sub‑500 ms with Three Simple Optimizations

Preface

Interface performance is a common pain point for backend developers. Optimizing an API requires multiple angles. Previously I wrote an article “11 Tips for Interface Performance Optimization”. This article continues the discussion by describing how I optimized a slow batch‑score query API.

1. Incident Investigation

Every morning we receive a summary email of slow queries, showing the endpoint, call count, max latency, average latency and traceId. One batch‑score query API showed a max latency of 20 s and an average of 2 s. Using SkyWalking we saw most calls return within 500 ms, but a few exceeded 20 s.

The root cause was that the front‑end page sent a massive list of IDs (hundreds to thousands) to the API, far beyond the intended page size of 10‑100 records. The API then performed a remote call to fetch organization info and executed a database query inside a for loop for each element.

2. Current Situation

Passing hundreds of IDs is still acceptable for a primary‑key lookup, but the query logic is complex. The pseudo‑code is shown below.

public List<ScoreEntity> query(List<SearchEntity> list) {
    List<ScoreEntity> result = Lists.newArrayList();
    List<Long> orgIds = list.stream().map(SearchEntity::getOrgId).collect(Collectors.toList());
    List<OrgEntity> orgList = feignClient.getOrgByIds(orgIds);
    for (SearchEntity entity : list) {
        String orgCode = findOrgCode(orgList, entity.getOrgId());
        ScoreSearchEntity scoreSearchEntity = new ScoreSearchEntity();
        scoreSearchEntity.setOrgCode(orgCode);
        scoreSearchEntity.setCategoryId(entity.getCategoryId());
        scoreSearchEntity.setBusinessId(entity.getBusinessId());
        scoreSearchEntity.setBusinessType(entity.getBusinessType());
        List<ScoreEntity> resultList = scoreMapper.queryScore(scoreSearchEntity);
        if (CollectionUtils.isNotEmpty(resultList)) {
            ScoreEntity scoreEntity = resultList.get(0);
            result.add(scoreEntity);
        }
    }
    return result;
}

The two critical problems are: (1) a remote call inside the API, and (2) a database query inside the loop.

3. First Optimization – Index Tuning

We added a composite index on org_code, category_id, business_id and business_type:

ALTER TABLE user_score ADD INDEX `un_org_category_business` (org_code, category_id, business_id, business_type) USING BTREE;

After applying the index, the maximum latency dropped from ~20 s to about 5 s.

4. Second Optimization – Multithreading

Because each loop iteration is independent, we switched from single‑threaded execution to parallel execution using CompletableFuture and a custom thread pool.

CompletableFuture[] futureArray = dataList.stream()
    .map(data -> CompletableFuture
        .supplyAsync(() -> query(data), asyncExecutor)
        .whenComplete((result, th) -> { }))
    .toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futureArray).join();

The thread pool configuration (core 8, max 10, keep‑alive 60 s, queue 500) is defined as:

ExecutorService threadPool = new ThreadPoolExecutor(
    8, 10, 60, TimeUnit.SECONDS,
    new ArrayBlockingQueue(500),
    new ThreadPoolExecutor.CallerRunsPolicy());

With this change the latency improved from ~5 s to ~1 s.

5. Third Optimization – Limiting Batch Size

We discovered that sending too many IDs at once is the fundamental bottleneck. We introduced a hard limit of 200 records per request; larger requests are rejected.

5.1 Front‑end Pagination

Ideally the front‑end would paginate the settlement‑order list so that each request contains at most 100 orders, keeping the total under 200 records. This requires front‑end development resources, which are currently unavailable.

5.2 Backend Batch Calls

Alternatively, the backend can split a large request into multiple smaller calls (e.g., five calls of 100 records) and execute them in parallel using the same thread‑pool strategy. This reduces the per‑call latency to under 500 ms.

These three rounds of optimization reduced the API’s worst‑case response time from 20 seconds to under 500 milliseconds, but a permanent solution will require redesigning the data model and workflow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backendoptimizationmultithreading
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.