How I Cut a 20‑Second API Call to Under 500 ms in Three Simple Steps
A backend engineer shares a step‑by‑step case study of diagnosing a 20‑second batch‑score query API, then applying three optimizations—index tuning, multithreaded execution, and request‑size limiting—to bring the response time down to under 500 ms.
Background and Problem
The author receives daily reports on slow API calls; one batch‑score query endpoint shows a maximum latency of 20 seconds and an average of 2 seconds. Although most requests return within 500 ms, occasional outliers exceed 20 seconds, prompting an investigation.
Investigation
Using SkyWalking, the team discovers that the endpoint is called from a settlement‑order list page, which aggregates many orders per settlement. The request therefore includes hundreds or thousands of IDs, far exceeding the intended pagination limit of 100 records per call.
First Optimization – Index Tuning
The original query performs a remote call to fetch organization data and then iterates over each record, executing a separate SELECT per iteration. The first improvement adds a composite index on org_code, category_id, business_id, and business_type:
ALTER TABLE user_score ADD INDEX `un_org_category_business` (org_code, category_id, business_id, business_type) USING BTREE;This reduces the maximum latency from 20 seconds to roughly 5 seconds.
Second Optimization – Multithreaded Query Execution
Because each iteration still performs a database query, the author replaces the single‑threaded loop with Java 8 CompletableFuture combined with a custom thread pool:
CompletableFuture[] futureArray = dataList.stream()
.map(data -> CompletableFuture.supplyAsync(() -> query(data), asyncExecutor)
.whenComplete((result, th) -> { })
.toArray(CompletableFuture[]::new);
CompletableFuture.allOf(futureArray).join();The thread pool is configured as:
ExecutorService threadPool = new ThreadPoolExecutor(
8, 10, 60, TimeUnit.SECONDS,
new ArrayBlockingQueue(500),
new ThreadPoolExecutor.CallerRunsPolicy());After this change, latency drops from about 5 seconds to 1 second, a five‑fold improvement.
Third Optimization – Limiting Batch Size
Even with indexing and multithreading, the endpoint still exceeds 1 second when processing large batches. The final step caps each request to 200 records (previously 2000) and proposes two mitigation strategies:
Front‑end pagination: Show only one order per settlement on the list page, limiting the total records per request to 200.
Back‑end batch calls: Split a large request into multiple 100‑record calls, optionally executed in parallel.
Applying this limit and parallel batch calls reduces the latency further to under 500 ms.
Conclusion
The three‑step approach—adding a composite index, parallelizing database queries with CompletableFuture, and restricting batch size—transforms a 20‑second API into a sub‑second service. The author notes that these are interim fixes; a full redesign of the data model and workflow would be needed for a permanent solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
