9 Proven Techniques to Supercharge Backend Service Performance
This article outlines nine practical methods—caching, parallel processing, batch handling, data compression, lock‑free design, sharding, request avoidance, pooling, and asynchronous processing—illustrated with Redis, MySQL, Go, and Kafka examples, showing how they collectively cut latency and improve throughput.
The author recently optimized a project's service performance, achieving an 80% reduction in average and p99 latency for service A and a 50% reduction for underlying services, and shares the nine most common techniques for improving service architecture.
1. Caching
Caching is essential at every layer. Browser caching can be controlled via Expires, Cache‑Control, Last‑Modified, and Etag. Server‑side caching can use in‑memory stores like Redis, which is fast because data resides in RAM, or MySQL's buffer pool that caches data pages using an LRU algorithm. When using caches, consider common pitfalls such as cache avalanche, cache penetration, cache breakdown, and hot keys, and apply strategies like random expiration, Bloom filters, empty‑value caching, and key sharding.
type LRUCache struct {
sync.Mutex
size int
capacity int
cache map[int]*DLinkNode
head, tail *DLinkNode
}
type DLinkNode struct {
key, value int
pre, next *DLinkNode
} typedef struct redisObject {
unsigned type:4;
unsigned encoding:4;
unsigned lru:LRU_BITS; /* LRU time or LFU data */
int refcount;
void *ptr;
} obj;2. Parallel Processing
Redis 6.0 introduced a multithreaded model that offloads socket I/O to multiple threads while keeping command execution single‑threaded, improving CPU utilization. MySQL’s master‑slave sync also uses parallel threads for binlog replay. In Go, the GMP scheduler reduces lock contention by binding goroutines to processors (P) and using local run queues, and DAGs can be employed for complex parallel workflows.
3. Batch Processing
Kafka batches messages per partition, reducing network overhead. Redis pipelines or Lua scripts can batch multiple commands to improve read/write throughput. Front‑end assets (JS/CSS) can be concatenated to lower HTTP request count. Care must be taken to avoid oversized batches that could degrade throughput.
4. Data Compression
Redis AOF rewrite reduces file size by keeping only the latest command per key. NoSQL stores like HBase and Cassandra use LSM trees, which rely on background compaction to merge segments and shrink storage. Kafka can compress messages at the producer side, saving bandwidth and disk space. Snappy compression can halve stored data size.
5. Lock‑Free Design
Go’s sync/atomic package provides lock‑free primitives, and the newer P‑based scheduler eliminates many mutexes. MySQL uses MVCC to allow concurrent reads/writes without row‑level locks, and multiple buffer pools can further reduce lock granularity. In read‑heavy scenarios, atomic.Value and sync.Map improve performance.
6. Sharding
Redis Cluster automatically shards data across nodes; Codis and similar proxies achieve the same effect. Kafka partitions spread load across brokers, and increasing partition count raises consumer parallelism. Database sharding (e.g., splitting tables by media type) and hot‑cold data separation also improve scalability.
7. Request Avoidance
Eliminate unnecessary I/O by avoiding redundant downstream calls, selecting only required fields in queries, lazy‑loading tabs on the client, and validating request parameters early. Reducing HTTP requests through asset merging and caching also speeds up web pages.
8. Pooling
Connection pools for MySQL, thread pools for request handling, and Go’s sync.Pool for object reuse reduce creation overhead and GC pressure. Goroutine pools reuse idle goroutines, and M‑P binding in the scheduler further cuts lock contention.
9. Asynchronous Processing
Redis uses background threads for RDB/AOF persistence; MySQL offers async, semi‑sync, and sync replication. Kafka producers/consumers can operate asynchronously, with callbacks for failure handling. Service‑level tasks such as monitoring, reporting, or post‑publish processing are moved to message queues to decouple latency‑critical paths.
In summary, each of these tenable techniques appears in common middleware, and understanding the underlying design rationales helps when selecting technologies or tuning services for better performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
