Optimizing a High‑Concurrency Backend Interface: Reducing Response Time from 30 seconds to 0.8 seconds
This article presents a real‑world case study of a high‑traffic backend API that originally took 30 seconds to process over two million records, detailing problem diagnosis, SQL and Java code analysis, and a series of optimizations—including SQL rewrites, database‑side aggregation, and Caffeine caching—that ultimately cut the response time to under one second.
In a high‑concurrency data‑processing scenario, an interface that queried more than 2 million rows originally required about 30 seconds to respond, which was unacceptable for a transaction‑interception system.
Problem Diagnosis
The latency was traced to the SQL execution layer after ruling out network and server hardware issues. The MyBatis query List<Map<String, Object>> list = transhandleFlowMapper.selectDataTransHandleFlowAdd(selectSql); caused a 30‑second execution when fetching 2 million rows.
SELECT programhandleidlist FROM anti_transhandle WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0';Running the raw SQL in PostgreSQL took only ~800 ms, indicating the bottleneck was in the Java‑side processing of the result set.
SQL‑Level Analysis
Using EXPLAIN ANALYZE revealed that the query itself was efficient, but the way the result set was handled caused excessive overhead.
Code‑Level Analysis
Each row was materialized into a Map object, creating millions of temporary objects and exhausting CPU time.
Optimization Measures
1. SQL Optimization
Transform the 2 million rows into a single aggregated result using PostgreSQL array functions:
SELECT array_agg(elem) AS concatenated_array
FROM (
SELECT unnest(programhandleidlist) AS elem
FROM anti_transhandle
WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0'
) sub;This reduces the result set to one row, cutting the data transferred to the application.
2. Move Business Logic to the Database
Count occurrences of each element directly in SQL, avoiding post‑processing in Java:
SELECT elem, COUNT(*) AS count
FROM (
SELECT unnest(programhandleidlist) AS elem
FROM anti_transhandle
WHERE create_time BETWEEN '2024-01-08 00:00:00.0' AND '2024-01-09 00:00:00.0'
) sub
GROUP BY elem;This query runs in ~0.7 seconds, the fastest achievable given the data volume.
3. Introduce Caching
Use Caffeine as a local cache to store historical statistics, reducing repeated database hits. Add the dependency:
<dependency>
<groupId>com.github.ben-manes.caffeine</groupId>
<artifactId>caffeine</artifactId>
<version>3.1.8</version>
</dependency>Configure the cache:
import com.github.benmanes.caffeine.cache.Caffeine;
import org.springframework.cache.CacheManager;
import org.springframework.cache.annotation.EnableCaching;
import org.springframework.cache.caffeine.CaffeineCacheManager;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.util.concurrent.TimeUnit;
@Configuration
@EnableCaching
public class CacheConfig {
@Bean
public CacheManager cacheManager() {
CaffeineCacheManager cacheManager = new CaffeineCacheManager();
cacheManager.setCaffeine(Caffeine.newBuilder()
.maximumSize(500)
.expireAfterWrite(60, TimeUnit.MINUTES));
return cacheManager;
}
}Apply the cache in the service method:
@Autowired
private CacheManager cacheManager;
private static final String YESTERDAY_HIT_RATE_CACHE = "hitRateCache";
@Override
public RuleHitRateResponse ruleHitRate(LocalDate currentDate) {
double hitRate = cacheManager.getCache(YESTERDAY_HIT_RATE_CACHE)
.get(currentDate.minusDays(1), () -> {
Map
hitRateList = dataTunnelClient.selectTransHandleFlowByTime(currentDate.minusDays(1));
// compute hit rate
return compute(hitRateList);
});
return hitRate;
}Result
After applying the SQL rewrite, moving aggregation to PostgreSQL, and adding Caffeine caching, the interface response time dropped from 30 seconds to under 0.8 seconds, meeting the performance requirement.
Conclusion
The case demonstrates that for massive data‑intensive queries, relational databases may become a bottleneck; leveraging set‑based SQL operations, database‑side aggregation, and appropriate caching can dramatically improve latency. For even larger scales, column‑oriented stores such as ClickHouse or Hive may be more suitable.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.