How to Diagnose and Fix API Performance Bottlenecks in Java Backend
This article walks through the background of a production Java service that received many performance complaints, enumerates common causes such as slow MySQL queries, complex business logic, thread‑pool misconfiguration, lock contention and machine issues, and provides concrete diagnostic steps and code‑level solutions including pagination fixes, indexing strategies, async processing, thread‑pool tuning, lock refinement and caching techniques.
Background
After the system entered the promotion phase, API latency became a major complaint. Monitoring for a week showed more than 20 slow endpoints, five with response times > 5 s, one > 10 s, and overall availability below 99.8 %.
Typical Causes of API Performance Problems
Database slow queries
Complex business logic
Improper thread‑pool configuration
Poor lock design
Machine‑level issues (full GC, restarts, thread exhaustion)
Solutions
1. Slow Queries (MySQL)
Deep pagination
Standard pagination scans all rows up to the offset, which is inefficient for large offsets. select name, code from student limit 100, 20; When the offset reaches 1 000 000, MySQL must scan over a million rows.
Solution: use the primary key to jump directly to the desired range.
select name, code from student where id > 1000000 limit 20;The caller must pass the last maximum id as a parameter.
Missing index
Check indexes with show create table <table_name>;. Add indexes only when the column has sufficient selectivity; low‑cardinality indexes are ineffective. Adding an index may lock the table, so perform it during low‑traffic periods.
Index not used
MySQL may ignore an index if the optimizer estimates a higher cost. Force its use:
select name, code from student force index (idx_name) where name = '天才';Excessive JOINs or subqueries
Prefer JOINs over subqueries and keep the number of joined tables to 2‑3 unless the data volume is tiny. Large joins can cause temporary tables on disk, dramatically slowing queries. Split the work in application code: fetch one table, build a map, then fetch related tables.
IN clause with many elements
When the IN list is large, even with an index the query can be slow. Split the list into smaller batches or use multithreading.
select id from student where id in (1,2,3,...,1000) limit 200;Guard against overly large batches in code:
if (ids.size() > 200) { throw new IllegalArgumentException("Batch size cannot exceed 200"); }Large data volume
If a single table grows to billions of rows, simple tuning is insufficient. Consider sharding, partitioning, or migrating to a database designed for big data.
2. Complex Business Logic
Loop calls
When the same calculation is performed independently for multiple months, parallelize the work.
List<Model> list = new ArrayList<>();
for (int i = 0; i < 12; i++) {
Model model = calOneMonthData(i);
list.add(model);
}Convert to a thread‑pool execution:
ExecutorService pool = new ThreadPoolExecutor(
5, 5, 300L, TimeUnit.SECONDS,
new LinkedBlockingQueue<>(10),
commonThreadFactory,
new ThreadPoolExecutor.DiscardPolicy()
);
List<Future<Model>> futures = new ArrayList<>();
for (int i = 0; i < 12; i++) {
futures.add(pool.submit(() -> calOneMonthData(i)));
}
List<Model> result = new ArrayList<>();
for (Future<Model> f : futures) {
result.add(f.get());
}Sequential calls without dependencies
Independent calls can be executed in parallel using CompletableFuture:
CompletableFuture<A> fA = CompletableFuture.supplyAsync(() -> doA());
CompletableFuture<B> fB = CompletableFuture.supplyAsync(() -> doB());
CompletableFuture.allOf(fA, fB).join();
C c = doC(fA.join(), fB.join());
CompletableFuture<D> fD = CompletableFuture.supplyAsync(() -> doD(c));
CompletableFuture<E> fE = CompletableFuture.supplyAsync(() -> doE(c));
CompletableFuture.allOf(fD, fE).join();
return doResult(fD.join(), fE.join());3. Thread‑Pool Design Issues
Key parameters are core pool size, maximum pool size, and work queue. If core threads are too few, parallelism suffers. A shared pool can be saturated by other services, causing tasks to wait. An overloaded queue leads to thread creation or task rejection. Adjust parameters per service or create dedicated pools.
4. Lock Design Problems
Two common pitfalls: using an inappropriate lock type (e.g., a mutex where a read‑write lock would suffice) and locking a scope that is too large.
Coarse‑grained lock example
public synchronized void doSome() {
File f = calData();
uploadToS3(f);
sendSuccessMessage();
}Only the data calculation needs synchronization:
public void doSome() {
File f;
synchronized (this) {
f = calData();
}
uploadToS3(f);
sendSuccessMessage();
}5. Machine‑Level Issues
Full GC, frequent restarts, or thread exhaustion degrade performance. Identify these via monitoring and mitigate by splitting large tasks, redesigning thread pools, or tuning JVM parameters.
6. Generic “Quick‑Fix” Strategies
Caching
Cache frequently read, rarely changed data in memory, SSD, or external caches (Redis, Tair, Memcached). Design keys to maximize hit rate.
Asynchronous callback / fast‑success response
For slow downstream calls (e.g., bank APIs), return a quick success with a “payment in progress” state, invoke the external service asynchronously, and notify the caller via a callback or message queue (Kafka).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
IT Architects Alliance
Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
