How I Cut a 20‑Second API Call to Sub‑500 ms with Three Simple Optimizations

This article walks through a real‑world backend API performance case, describing how the author identified a slow batch‑score query, applied index tuning, introduced CompletableFuture‑based multithreading, and limited batch sizes, ultimately reducing response time from 20 seconds to under 500 milliseconds.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
How I Cut a 20‑Second API Call to Sub‑500 ms with Three Simple Optimizations

Introduction

Interface performance is a critical topic for backend developers. Optimizing an API requires a multi‑faceted approach.

The author previously wrote an article on eleven API‑performance tricks; this piece continues the discussion with a concrete slow‑query case.

1. Incident Investigation

Daily monitoring emails showed a batch‑score query endpoint with maximum latency of 20 seconds and average latency of 2 seconds. Most calls returned within 500 ms, but occasional requests exceeded 20 seconds.

The root cause was not data size but the calling pattern: the settlement‑order list page sent a massive parameter list (hundreds to thousands of IDs) to the batch‑score endpoint, far exceeding the intended pagination limits (10‑100 records per page).

2. Current Situation

Even though bulk ID queries can use primary‑key indexes, the endpoint’s logic is complex, involving a remote service call and per‑record database queries inside a

public List<ScoreEntity> query(List<SearchEntity> list) { ... }

loop.

The two main bottlenecks are:

Remote service invocation within the API.

Database queries inside a for loop.

3. First Optimization – Index Tuning

Instead of redesigning the data model, the author added a composite index on org_code, category_id, business_id, and business_type:

alter table user_score add index `un_org_category_business` (`org_code`,`category_id`,`business_id`,`business_type`) USING BTREE;

This reduced the maximum latency from ~20 s to ~5 s.

4. Second Optimization – Multithreaded Queries

To avoid single‑threaded database access, the code was refactored to use Java 8 CompletableFuture with a custom thread pool:

CompletableFuture[] futureArray = dataList.stream()
    .map(data -> CompletableFuture
        .supplyAsync(() -> query(data), asyncExecutor)
        .whenComplete((result, th) -> { }))
    .toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futureArray).join();

The thread pool configuration (core 8, max 10, keep‑alive 60 s, queue 500) was defined via ThreadPoolExecutor or Spring’s ThreadPoolTaskExecutor. This optimization yielded another 5× speedup, bringing latency down to ~1 s.

5. Third Optimization – Limiting Batch Size

Even after the first two steps, latency remained above 1 s because each request still fetched too many records at once. The solution was to cap the number of records per request to 200; larger requests now return an error.

Two implementation options were discussed:

5.1 Front‑end Pagination

Modify the settlement‑order list UI to display only one order per settlement and paginate the rest, limiting each backend call to at most 200 records. This requires front‑end development resources, which are currently unavailable.

5.2 Server‑side Batch Calls

Change the back‑end system to split a large request into multiple smaller batches (e.g., five batches of 100 records) and execute them in parallel using the same thread‑pool approach. This further reduced latency from ~1 s to under 500 ms.

While multithreading is a quick fix, a long‑term solution would involve redesigning the data model and business flow, which is planned for future releases.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceoptimization
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.