Backend Development 48 min read

10 Proven Techniques to Optimize API Latency from 11 s to 170 ms – A Meituan Interview Case

The article presents a step‑by‑step analysis of how to shrink a 11‑second API response to 170 ms by applying batch database writes, Redis pipeline, asynchronous processing, thread‑pool design, local‑memory buffering, MQ integration, and other performance‑tuning patterns, backed by real‑world benchmarks and code samples.

Tech Freedom Circle

Aug 14, 2025

10 Proven Techniques to Optimize API Latency from 11 s to 170 ms – A Meituan Interview Case

Problem

During a Meituan interview the candidate was asked how to locate and resolve an interface performance bottleneck that caused a 504 timeout because the request exceeded Nginx's 10 second timeout.

Optimization checklist

The following dimensions can be used to reduce latency from seconds to sub‑second levels.

1. Data‑processing optimization

Batch database operations – inserting 10 rows individually takes 50 ms, 100 rows 500 ms, 1 000 rows 5 000 ms; the same workloads with batch inserts take 15 ms, 20 ms and 50 ms respectively (3.3×‑100× speed‑up). Enabling rewriteBatchedStatements=true rewrites the batch into a single multi‑value INSERT, further reducing overhead.

public void batchInsert(List<User> users) throws SQLException {
    String sql = "INSERT INTO users (name, email, age) VALUES (?, ?, ?)";
    try (Connection conn = dataSource.getConnection();
         PreparedStatement pstmt = conn.prepareStatement(sql)) {
        conn.setAutoCommit(false);
        for (User user : users) {
            pstmt.setString(1, user.getName());
            pstmt.setString(2, user.getEmail());
            pstmt.setInt(3, user.getAge());
            pstmt.addBatch();
            if (users.indexOf(user) % 100 == 0) {
                pstmt.executeBatch();
            }
        }
        pstmt.executeBatch();
        conn.commit();
    } catch (SQLException e) {
        conn.rollback();
        throw e;
    }
}

Redis pipeline – combine local buffering with batch execution to eliminate per‑command network latency. The pattern yields order‑of‑magnitude throughput gains in high‑concurrency write paths.

2. Asynchronous design

Asynchronous calls return immediately, allowing the caller to continue while the callee processes in the background. A typical RabbitMQ consumer using CompletableFuture:

@RabbitListener(queues = "order.queue")
public void processOrder(Order order, Channel channel,
                         @Header(AmqpHeaders.DELIVERY_TAG) long tag) {
    CompletableFuture.runAsync(() -> {
        try {
            orderService.process(order);
            channel.basicAck(tag, false);
        } catch (Exception e) {
            channel.basicNack(tag, false, true);
        }
    });
}

Common async patterns (ordered by complexity and decoupling): thread‑pool, local memory + scheduled task, message‑queue, agent + MQ.

3. Parallel execution

Parallelism splits work into independent subtasks and runs them concurrently. The total time can be approximated as:

TotalTime = max(SubtaskTime) + CoordinationOverhead

Java CompletableFuture can be used to process a list of tracks in parallel:

public CompletableFuture<Void> processTracksAsync(List<Track> tracks) {
    List<CompletableFuture&lt;Void&gt;> futures = tracks.stream()
        .map(track -> CompletableFuture.runAsync(() -> processSingleTrack(track), executor))
        .toList();
    return CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]));
}

4. Resource pooling & compression

Pooling (thread pools, connection pools, object pools) eliminates repeated creation overhead. Enabling rewriteBatchedStatements reduces batch creation time from 100‑200 ms to <1 ms (100‑200× improvement). Compressing payloads with algorithms such as Pzstd or ISA‑L dramatically reduces network transfer time.

5. Database access tuning

Indexing is the cheapest optimization. Common patterns that prevent index usage include functions on indexed columns, leading wildcards, !=, OR with unindexed columns, etc. Use EXPLAIN and SHOW CREATE TABLE to verify index usage.

SQL best practices:

Avoid SELECT * and retrieve only required columns.

Prefer covering indexes and small‑table‑driven joins.

Paginate with LIMIT and avoid large OFFSET (see deep‑pagination section).

Replace OR with UNION when it improves index usage.

Choose IN vs EXISTS based on data size.

Keep transactions short and batch updates.

6. Deep pagination

Large OFFSET values cause full table scans. Mitigation strategies include covering indexes, sub‑queries, tag‑record method, and partitioned tables.

7. NoSQL for massive data

Four families are commonly used:

Key‑Value (Redis) – fast atomic counters, session cache, inventory.

Document (MongoDB) – flexible schema for product details.

Wide‑column (HBase/Cassandra) – time‑series logs, IoT data.

Graph (Neo4j) – relationship queries such as friend‑of‑friend recommendations.

Example hybrid architecture for an e‑commerce platform:

MySQL for core transactions.

MongoDB for product metadata.

HBase for clickstream logs.

Redis for real‑time inventory.

Kafka for asynchronous event processing.

8. Avoiding large transactions

Large transactions hold locks long, cause deadlocks, increase memory pressure and replication lag. Splitting a monolithic order‑creation transaction into a core transactional part and async side‑effects reduces lock time.

@Transactional
public Long createOrderCore(OrderDTO order) {
    orderMapper.insert(order);
    inventoryMapper.decrease(order.getProductId(), order.getNum());
    return order.getId();
}

public void handleOrderAsync(Long orderId) {
    asyncService.execute(() -> logisticsMapper.insert(...));
    smsService.sendAsync(...);
    pointsService.addAsync(...);
}

9. Lock granularity, GC, thread exhaustion, resource leaks

Over‑broad synchronized blocks serialize unrelated work; fine‑grained locking restores concurrency. Frequent Full GC caused by large in‑memory Excel generation can be mitigated with streaming libraries (EasyExcel) or larger heap settings. Thread‑pool mis‑configuration leads to request queuing; proper sizing and rate‑limiting (e.g., Sentinel) are recommended. Unreleased I/O resources cause descriptor exhaustion; use try‑with‑resources and monitoring.

Conclusion

Applying the checklist—batching, async, parallelism, pooling, compression, indexing, pagination, appropriate NoSQL selection, and disciplined transaction design—can reduce API latency from seconds to sub‑second levels, avoid common pitfalls, and enable systems to scale under high concurrency.

Java Performance database concurrency Async API optimization NoSQL

Written by

Tech Freedom Circle

Crazy Maker Circle (Tech Freedom Architecture Circle): a community of tech enthusiasts, experts, and high‑performance fans. Many top‑level masters, architects, and hobbyists have achieved tech freedom; another wave of go‑getters are hustling hard toward tech freedom.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.