Backend Development 16 min read

Boosting a Logistics Service from 190 to 2,200 QPS: Performance Testing Insights

Through systematic performance testing, caching strategy analysis, server configuration tuning, Elasticsearch shard optimization, and validator refactoring, the team transformed the recommendation service’s throughput from roughly 190 QPS to over 2,200 QPS, demonstrating how targeted backend optimizations can dramatically improve high‑concurrency applications.

Huolala Tech

Aug 20, 2020

Boosting a Logistics Service from 190 to 2,200 QPS: Performance Testing Insights

1. Background

To reduce communication cost and improve meeting efficiency between drivers and cargo, the product introduced a "recommended loading/unloading point" (also called "small orange point"). By mining driver trajectories, real loading/unloading points are recommended, guiding users to place orders at these points, thus shortening the distance between the order address and the actual location. After development, the map team needed to ensure the service could handle high concurrent map queries without degradation, and to estimate required server capacity.

2. Main Performance Testing Process

The typical performance testing workflow includes plan, environment, data, scripts, execution, and reporting. A diagram (omitted) illustrates these steps.

3. Performance Testing Plan

The plan analyzes performance requirements and defines strategy, schedule, methods, data preparation, and environment setup. For this Java‑based three‑tier service using Elasticsearch, several factors affect results.

3.1 Cache – Should Tests Hit It?

Caches are ubiquitous. Whether a test should hit the cache depends on the scenario. For driver‑ID‑based Redis cache, full hits are realistic, but for GPS‑based point recommendation, coordinates vary per request, so scripts should use dynamic coordinates to avoid cache hits.

3.2 Server Configuration Impact

Performance environments mirror production hardware (8C16G) for application servers, but Elasticsearch in production uses larger instances (16C64G). Database capacity often becomes the bottleneck; if the application hits its limit first, database differences matter less.

3.3 Base Data Influence

Accurate testing requires production‑scale data. The test used 38 million records (~24 GB) to reflect real conditions.

4. Test Execution

Execution covers script preparation, data loading, load execution, regression testing, and issue localization.

4.1 Performance Environment Degradation

Initial tests in the performance environment yielded only 195 QPS, far below the required capacity, likely due to smaller Elasticsearch resources.

4.2 Production Environment Results

Running the same tests in production produced similar or slightly better results, confirming a genuine performance issue.

4.3 Returning to Performance Environment for Diagnosis

Due to security and deployment constraints in production, further debugging was performed in the performance environment.

5. Performance Issue Localization

Different issues require different diagnostic focuses.

5.1 Low‑QPS Issues

Using monitoring tools like arthas trace cn.xxx.xxx.class method, the slowest method was identified in cn.huolala.bizp.map.controller.UserRecController.validate, which took ~300 ms per call. Disabling this validation raised QPS from ~195 to ~580.

5.2 General Performance Issues

After fixing the validation, QPS reached ~600, but Elasticsearch queries remained a bottleneck. The RestHighLevelClient.search() call could not be further optimized, so attention shifted to ES configuration. Monitoring showed ~17 000 ES queries per second, indicating ~30 × amplification due to 30 shards.

Reducing shards to 3 and rebuilding the index increased QPS to ~3 230. Later, setting shards to 5 stabilized performance around 2 200 QPS.

6. Code Optimization

The original validator was instantiated on each call, causing heavy object creation and I/O. Refactoring to a static singleton reduced validation time from ~28 ms per call to ~2.8 ms, yielding a ten‑fold improvement.

6.1 Reproducing the Issue

private static ResultModel validate(Object object) {
    // obtain validator
    Validator validator = Validation.buildDefaultValidatorFactory().getValidator();
    // execute validation
    Set<ConstraintViolation<Object>> constraintViolations = validator.validate(object);
    for (ConstraintViolation<Object> constraintViolation : constraintViolations) {
        return ResultModel.error(constraintViolation.getMessage());
    }
    return null;
}

6.2 Source Inspection

Each call creates a new ValidatorFactory and opens input streams, leading to unnecessary I/O overhead.

6.3 Optimized Validation

Making the validator a static variable (or using Spring’s @Validated) cut the 100‑iteration test time from 2 772 ms to 278 ms.

7. Summary

The recommended loading/unloading point service improved from ~190 QPS to ~2 200 QPS after systematic performance testing, cache strategy refinement, Elasticsearch shard tuning, and validator refactoring. The case highlights the importance of holistic backend knowledge—load tools, OS, application code, middleware, and databases—to achieve reliable high‑concurrency performance.

8. Postscript

The two highlighted issues—non‑singleton object creation and query amplification due to excessive shards—are common across projects. Teams should proactively audit code and configurations to prevent hidden performance risks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Testing caching Load Testing backend optimization

Written by

Huolala Tech

Technology reshapes logistics

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.