Operations 23 min read

Boost Server Performance: CPU, Memory, Disk, Network & Concurrency Optimizations

This article summarizes Tao Hui's 2020 GOPS Global Operations Conference talk, covering practical techniques for optimizing basic resources, improving network efficiency, reducing request latency, and scaling system concurrency to achieve higher throughput and lower latency in modern distributed services.

Efficient Ops
Efficient Ops
Efficient Ops
Boost Server Performance: CPU, Memory, Disk, Network & Concurrency Optimizations
This article is based on Tao Hui's talk at the 2020 GOPS Global Operations Conference (Shenzhen).

The presentation is divided into four main parts: basic resource optimization, network efficiency optimization, request latency reduction, and system concurrency improvement.

Basic resource optimization

Network efficiency optimization

Reducing request latency

Improving system concurrency

1. Basic Resource Optimization

Optimizing basic resources focuses on increasing utilization across four areas: CPU cache, memory, disk, and scheduling.

CPU cache : Improve cache hit rate across the three cache levels (L1~L3) to gain universal performance gains.

Memory : Use memory pools (e.g., in C, JVM, Python, Go, Lua) to speed allocation, reduce fragmentation, and increase utilization.

Disk : For HDDs, optimize PageCache, I/O scheduling, zero‑copy, and Direct IO; for SSDs, adopt different programming and caching strategies.

Scheduling : Enhance request dispatch among processes, threads, or coroutines to improve synchronization speed.

CPU

CPU cache optimization examples include adjusting Nginx hash table bucket sizes for domain and variable hashes to align with cache line sizes (typically 64 bytes) and using padding to avoid false sharing in multi‑core environments.

Modern CPUs often run at 3–4 GHz, and performance gains come from better cache utilization and avoiding contention between cores.

Memory

Memory pool choices (e.g., TCMalloc, ptmalloc2) affect allocation speed and fragmentation. Specialized pools for small allocations avoid locking, while larger pools may trade speed for simplicity.

Languages such as Lua, Java (JVM), and Go provide their own memory pool implementations, which are crucial for high‑performance server development.

Disk

For HDDs, improving PageCache hit rates and I/O scheduling (e.g., elevator algorithm) can boost throughput. SSDs introduce write amplification and wear‑leveling concerns; careful write patterns and garbage‑collection awareness are needed.

SSD advantages include higher IOPS, lower latency, and true concurrent access, but they require strategies to mitigate write amplification and wear.

2. Network Efficiency Optimization

Network efficiency improvements target three layers: system‑level transport, application‑level encoding, and application‑level transmission.

System layer : Optimize TCP handshake, buffer sizes, and congestion control based on network conditions.

Application encoding : Use efficient codecs and compression.

Application transmission : Leverage HTTP/2, HTTP/3 (QUIC) and related features such as multiplexing, connection migration, and QPACK header compression.

HTTP/3, built on QUIC over UDP, eliminates head‑of‑line blocking by providing independent streams, connection IDs for migration, and efficient header compression.

3. Reducing Request Latency

Latency reduction focuses on four techniques: caching, asynchronous processing, MapReduce, and stream processing.

Cache read/write strategies (e.g., write‑through vs. write‑back, CAP considerations).

Asynchronous pipelines to avoid blocking.

MapReduce for parallel data aggregation.

Stream processing with time windows for real‑time analytics.

4. Improving System Concurrency

Scaling concurrency involves load‑balancing strategies across three dimensions (X, Y, Z axes) and consistent hashing with virtual nodes to avoid hot‑spot overload and ensure graceful degradation.

Techniques include round‑robin or least‑connections upstream selection, read/write splitting for databases, API‑gateway segregation, and sharding with consistent hashing enhanced by virtual nodes.

Persistent storage can use quorum‑based protocols (NWR) where the sum of read and write quorum exceeds the replication factor to guarantee strong consistency despite failures.

In summary, the talk covered a bottom‑up view of performance tuning, from CPU cache and memory pools to network protocols and distributed consistency mechanisms.

performance optimizationoperationsconcurrencysystem resourcesnetwork efficiency
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.