Backend Development 13 min read

10 Proven Techniques to Supercharge Backend API Performance

This article presents a comprehensive guide for backend developers, covering ten practical techniques—including local and distributed caching, parallel execution, asynchronous processing, connection pooling, sharding, SQL tuning, pre‑computation, batch operations, lock granularity, and context propagation—to dramatically improve API performance.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
10 Proven Techniques to Supercharge Backend API Performance

1. Local Cache

Local cache runs in the same process as the application, providing ultra‑fast access without network overhead. It is suitable for single‑instance deployments or clusters where nodes do not need to share cache state. The downside is memory waste and lack of sharing across services.

Common local cache libraries include Guava and Caffeine , both distributed as simple JAR packages.

Typical scenarios for local cache:

Data with low freshness requirements where a short TTL is acceptable.

Immutable mappings such as order‑id to user‑id.

Local cache is easy to adopt; you can find tutorials online.

2. Distributed Cache

Distributed caches break the stateful nature of local caches by providing a clustered, independently managed service with virtually unlimited capacity. Network latency (1‑2 ms) is negligible compared with the benefits.

Popular distributed cache systems include Memcached and Redis . They can handle tens of thousands of QPS per node, offloading read/write pressure from relational databases.

Ensure a high hit rate to achieve effective load reduction.

Size the cache according to business needs to avoid eviction of hot data.

Maintain data consistency.

Plan for rapid scaling.

Monitor average, max, and min response times, QPS, network traffic, and client connections.

3. Parallelization

Analyze business flows, draw sequence diagrams, and identify which steps can run in parallel. Leverage multi‑core CPUs to execute independent tasks concurrently.

Java's CompletableFuture offers around 50 APIs for serial, parallel, composition, and error handling.

4. Asynchronization

Separate non‑core logic from the main request path and execute it asynchronously. For example, after creating an order, send a message to a MQ for downstream notifications (SMS, email) without blocking the primary flow.

5. Pooling

Keep reusable objects (threads, connections, buffers) in a pool to avoid the cost of repeated creation and destruction. Core pool parameters include minimum, idle, and maximum size.

Connection pool parameters: min connections, idle connections, max connections.
<code>new ThreadPoolExecutor(3, 15, 5, TimeUnit.MINUTES,
    new ArrayBlockingQueue<>(10),
    new ThreadFactoryBuilder().setNameFormat("data-thread-%d").build(),
    (r, executor) -> {
        if (r instanceof BaseRunnable) {
            ((BaseRunnable) r).rejectedExecute();
        }
    });</code>

6. Sharding (Database Partitioning)

MySQL InnoDB uses B+‑tree storage; a single table can hold millions of rows, but massive user bases often require horizontal partitioning into multiple identical physical tables to relieve storage and access pressure.

Sharding introduces challenges such as data skew, global unique ID generation, and routing logic.

7. SQL Optimization

Poorly written SQL dramatically degrades API latency. Common pitfalls include deep pagination, missing indexes, and full‑table scans.

Avoid SELECT * ; select only required columns.

Use LIMIT 1 when only one row is needed.

Keep the number of indexes reasonable (typically ≤ 5).

Prefer AND over OR in WHERE clauses to preserve index usage.

Do not index low‑cardinality columns (e.g., gender).

Index columns used in WHERE and ORDER BY to prevent full scans.

8. Pre‑computation

Complex business calculations (e.g., site PV, red‑packet statistics) are performed ahead of time and cached, allowing API calls to read from cache instantly.

9. Batch Read/Write

IO is often the bottleneck. Instead of issuing 100 individual queries, expose a batch endpoint that retrieves all data in a single call, and use batch updates for writes.

10. Lock Granularity

Use locks only around truly contended resources. Over‑locking reduces concurrency; keep the lock scope minimal.

11. Context Propagation

Pass a reusable context object through the call chain to avoid repeated remote lookups for the same data.

12. Collection Size Planning

When the expected number of elements is known, pre‑size collections to avoid costly resizing. For example, ArrayList expands by 1.5× when its threshold is exceeded, causing data copying.

<code>List<String> lists = Lists.newArrayList();</code>

13. Query Optimization

Avoid loading massive result sets into memory; prefer pagination or batch fetching to keep memory usage under control.

backendperformance optimizationcachingdatabase shardingasynchronous processingSQL Tuning
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.