18 Proven Strategies to Slash API Response Times from Seconds to Milliseconds

This article shares eighteen practical techniques—including batch database operations, asynchronous processing, caching, prefetching, pooling, event callbacks, parallel remote calls, lock granularity, file‑based staging, indexing, SQL tuning, transaction management, deep pagination fixes, code restructuring, payload compression, NoSQL adoption, thread‑pool tuning, and JVM/IO optimizations—to dramatically reduce backend interface latency and improve overall system performance.

macrozheng
macrozheng
macrozheng
18 Proven Strategies to Slash API Response Times from Seconds to Milliseconds

Preface

In a previous project I encountered a 504 timeout caused by an interface taking longer than the Nginx timeout of 10 seconds. After performance tuning, the response time dropped from 11.3 s to 170 ms. Below are some common optimization approaches.

1. Batch Thinking: Batch Database Operations

Before optimization:

//for loop single insert
for(TransDetail detail : transDetailList){
    insert(detail);
}

After optimization:

batchInsert(transDetailList);

Analogy: Moving 10,000 bricks with an elevator that can carry 500 at a time is far faster than moving one brick per trip.

2. Asynchronous Thinking: Offload Time‑Consuming Operations

Place long‑running tasks, such as bank‑routing number matching, into asynchronous processing to reduce perceived latency.

After moving the matching step to async, the flow becomes faster.

User registration notifications (SMS/email) can also be handled asynchronously.

Implementation can use thread pools or message queues.

3. Space‑Time Tradeoff: Caching

Appropriate use of caches (Redis, JVM local cache, Memcached, Map, etc.) stores frequently accessed data in memory, eliminating costly database reads.

In a transfer interface, the old code queried the database for each transaction to compute routing numbers, which was slow. Introducing a cache dramatically reduced latency.

After caching, the flow is streamlined.

4. Prefetch Thinking: Initialize Data in Cache Early

Pre‑compute and store complex query results in cache before they are needed, so later requests can fetch them instantly, greatly improving performance.

During a live‑stream project, we pre‑loaded user and score data into cache at startup, enabling fast list rendering.

5. Pooling Thinking: Pre‑allocate and Reuse

Thread pools, connection pools, and keep‑alive sockets avoid the overhead of creating and destroying resources for each request.

Thread pools manage threads, reducing creation cost and preventing resource exhaustion.

6. Event‑Callback Thinking: Avoid Blocking Wait

Instead of blocking on a slow external system B, use an event‑callback model (similar to I/O multiplexing) to continue other work and handle the response when it arrives.

7. Remote Call Parallelization

Convert sequential remote calls (e.g., user info, banner, popup) into parallel requests to cut total latency.

Parallel execution dramatically reduces overall response time.

8. Lock Granularity: Avoid Overly Coarse Locks

Lock only the minimal shared resource (e.g., a specific list) instead of locking an entire class or system, preventing unnecessary contention.

Locking the whole house when you only need to lock the bathroom is wasteful.
// Incorrect: coarse lock
public int wrong(){
    long begin = System.currentTimeMillis();
    IntStream.rangeClosed(1,10000).parallel().forEach(i->{
        synchronized(this){
            slowNotShare();
            data.add(i);
        }
    });
    log.info("consume time:{}", System.currentTimeMillis()-begin);
    return data.size();
}
// Correct: fine‑grained lock
public int right(){
    long begin = System.currentTimeMillis();
    IntStream.rangeClosed(1,10000).parallel().forEach(i->{
        slowNotShare(); // no lock needed
        synchronized(data){
            data.add(i);
        }
    });
    log.info("consume time:{}", System.currentTimeMillis()-begin);
    return data.size();
}

9. Switch Storage to File for Temporary Staging

When database inserts become a bottleneck for massive data, write the bulk data to a file first, then asynchronously load it into the database, achieving a ten‑fold speedup.

In a transfer interface handling 1,000 detail records per batch, persisting them to a file and later processing reduced the latency from ~6 s to a fraction of that.

10. Index Optimization

Adding appropriate indexes is the cheapest and often most effective way to speed up queries.

Ensure critical SQL statements have indexes.

Verify that indexes are actually used (e.g., via EXPLAIN).

Design indexes reasonably—avoid redundant or overly many indexes, and don’t index low‑cardinality columns.

10.1 No Index

Run EXPLAIN SELECT * FROM user_info WHERE userId LIKE '%123'; to check if an index is missing, then add it with ALTER TABLE user_info ADD INDEX idx_name (name);.

10.2 Index Not Effective

Common reasons for index loss of effect are listed in the accompanying diagram.

10.3 Bad Index Design

Remove redundant or duplicate indexes.

Keep the number of indexes per table ≤ 5.

Avoid indexing columns with many duplicate values (e.g., gender).

Use covering indexes when appropriate.

If you need to force an index, reconsider its design.

11. SQL Optimization

Beyond indexing, rewrite inefficient SQL, avoid unnecessary columns, and use proper joins. Detailed guidance is available in referenced articles.

12. Avoid Large Transaction Problems

Long‑running transactions hold database connections, causing timeouts, deadlocks, and replication lag. Solutions include:

Do not place RPC calls inside a transaction.

Keep read‑only operations outside the transaction.

Limit the amount of data processed within a transaction.

13. Deep Pagination Problem

Using LIMIT 100000,10 forces the database to scan and discard 100,000 rows, which is slow.

13.1 Tag‑Record Method

Store the last retrieved primary key (e.g., 100000) and query with WHERE id > 100000 LIMIT 10, leveraging the primary‑key index.

select id, name, balance FROM account where id > 100000 limit 10;
This requires a monotonically increasing column.

13.2 Delayed‑Join Method

First fetch primary keys using a secondary index, then join back to the main table, reducing table scans.

select acct1.id, acct1.name, acct1.balance FROM account acct1
INNER JOIN (
    SELECT a.id FROM account a WHERE a.create_time > '2020-09-19' limit 100000,10
) AS acct2 ON acct1.id = acct2.id;

14. Optimize Program Structure

Eliminate unnecessary object creation, redundant database calls, and inefficient algorithms. Simple logic reordering can cut the number of condition checks.

Original: if(isUserVip && isFirstLogin){ sendSmsMsg(); } checks the expensive VIP flag first.
if(isFirstLogin && isUserVip){
    sendMsg();
}

This reduces the number of expensive checks.

15. Compress Transmission Content

Compressing payloads (e.g., using gzip) reduces bandwidth usage and speeds up transfer, especially for large files like videos.

A horse carrying 10 kg travels faster than one carrying 100 kg.

16. Massive Data Handling – Consider NoSQL

When relational databases become a bottleneck for huge datasets, switch to NoSQL solutions such as Elasticsearch or HBase, or employ sharding and partitioning.

17. Reasonable Thread‑Pool Design

Key parameters to tune are core size, maximum size, and work queue. An undersized core pool limits parallelism; an unbounded queue can cause OOM; mixing business‑critical and background tasks can degrade core services.

18. Machine‑Level Issues (Full GC, Thread Saturation, Unclosed IO)

Full GC pauses, thread exhaustion, and leaked file handles also increase latency. Monitoring and proper resource cleanup are essential.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backenddatabase
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.