Backend Development 28 min read

Nine Common Techniques for Service Performance Optimization

The article outlines nine broadly applicable techniques—caching, parallel and batch processing, data compression, lock‑free design, sharding, eliminating unnecessary requests, and resource pooling—that together can dramatically cut service latency and improve throughput, as demonstrated by an 80% latency reduction in a real‑world project.

Tencent Cloud Developer

Aug 6, 2024

Nine Common Techniques for Service Performance Optimization

The author recently optimized the performance of a service project by addressing several design flaws such as JSON payloads between services, monitoring logic embedded in the main flow, repeated downstream requests, and serial execution of time‑consuming operations. The optimizations reduced average latency and p99 latency of Service A by 80% and cut the underlying service latency by 50%.

This article summarizes nine widely applicable methods for improving service performance.

1. Caching

Caching is essential because a typical request involves at least two network I/Os (client → service → database). Browser‑side caching can be controlled via Expires, Cache‑Control, Last‑Modified, and Etag. Server‑side caching can use in‑memory stores such as Redis, which keeps data in RAM for fast access, or leverage MySQL’s buffer pool (LRU‑based) to cache data pages.

When choosing a cache, consider consistency vs. speed. Redis guarantees cross‑machine consistency but adds an extra I/O hop; local memory caches are faster but may become inconsistent across instances. A hybrid approach (Redis + local memory) can combine the benefits.

key val => key1 val, key2 val, key3 val, key4 val

2. Parallel Processing

Earlier versions of Redis (pre‑6.0) used a single‑threaded event loop, which limited CPU utilization. Redis 6.0 introduced a multithreaded I/O model that parallelizes socket reads/writes while keeping command execution single‑threaded, improving throughput on multi‑core machines.

MySQL’s replication also benefits from parallelism: multiple I/O threads read binlogs, and multiple SQL threads apply changes concurrently, reducing replication lag.

3. Batch Processing

Kafka batches messages per partition before sending them to the broker, reducing network overhead. Similarly, services can batch database reads or writes (e.g., using Redis pipelines or Lua scripts) to minimize round‑trips.

4. Data Compression

Redis AOF rewrite (via bgrewriteaof) compacts the append‑only log, preventing unbounded growth. In Kafka, enabling compression on producers and consumers reduces storage and bandwidth usage.

5. Lock‑Free Design

Redis’s single‑threaded model avoids lock contention. In Go, the sync/atomic package provides lock‑free primitives, and the GMP scheduler reduces lock overhead by assigning goroutines (G) to processors (P) with local run queues.

6. Sharding

Redis clusters and Codis split data across multiple nodes, overcoming single‑machine storage and throughput limits. Kafka partitions similarly distribute load across brokers, enabling horizontal scaling.

7. Avoid Unnecessary Requests

Reduce I/O by selecting only needed columns (avoid full table scans), merging static assets, lazy‑loading tabs on mobile, and disabling unused API fields.

8. Pooling

Goroutine reuse via Go’s P‑local queues and sync.Pool reduces allocation overhead. Database connection pools and thread pools similarly avoid the cost of creating resources per request.

9. Summary

Understanding the underlying design of middleware (Redis, MySQL, Kafka, etc.) helps apply these techniques effectively. Analyzing service call chains, CPU usage, and I/O hotspots guides where to apply caching, parallelism, batching, compression, lock‑free structures, sharding, request reduction, pooling, and async processing for maximum performance gains.

type LRUCache struct {
    sync.Mutex
    size int
    capacity int
    cache map[int]*DLinkNode
    head, tail *DLinkNode
}

type DLinkNode struct {
    key, value int
    pre, next *DLinkNode
}

typedef struct redisObject {
  unsigned type:4;
  unsigned encoding:4;
  unsigned lru:LRU_BITS; /* LRU time or LFU data */
  int refcount;
  void *ptr;
} obj;

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems performance optimization concurrency Golang Redis Caching

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.