Operations 32 min read

Optimizing Web Servers for High Throughput and Low Latency – Insights from Dropbox’s Edge Network

This article presents a comprehensive, data‑driven guide to reducing latency and increasing throughput of Nginx‑based web servers by covering hardware selection, low‑level OS tuning, network‑stack adjustments, TLS optimizations, and application‑level configurations, all illustrated with real‑world Dropbox experience.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Optimizing Web Servers for High Throughput and Low Latency – Insights from Dropbox’s Edge Network

Overview – The author, an SRE on Dropbox’s Traffic Team, shares an expanded version of a 2017 NginxConf talk that details systematic latency‑sensitive optimizations spanning hardware, kernel, networking, and application layers.

Hardware – Choose CPUs with AVX2, AES‑NI, and preferably AVX‑512 support; Intel Haswell/Broadwell/Skylake or AMD EPYC are recommended. Use 10‑25 Gb NICs, fast DDR memory for latency‑critical workloads, and SSD/flash storage for high cache demands. Keep firmware and drivers up‑to‑date, and prefer newer microcode.

Low‑level OS tuning – Keep firmware recent, separate kernel and driver updates via DKMS, and use tools like cpupower , turbostat , and intel_pstate to verify CPU frequency and power states. Set the CPU governor to performance , enable x86_energy_perf_policy , and consider /dev/cpu_dma_latency or busy‑polling for ultra‑low latency traffic. Manage NUMA by binding processes to nodes with numactl --cpunodebind and monitor with numastat .

PCIe and NIC – Verify link width/speed with lspci , disable ASPM if it adds latency, and use ethtool to tune interrupt affinity, buffer sizes, coalescing, and offloads (prefer GRO over LRO, be cautious with TSO/GSO). Adjust interrupt distribution for either maximum throughput (all NUMA nodes) or minimal latency (single node).

Network stack – Modern kernels already enable many TCP/IP improvements, but explicit tuning is still valuable. Collect metrics via /proc/net/snmp , /proc/net/netstat , ss , and tools like tcptrace . Use BBR or other congestion control algorithms, enable fair queuing (fq), and consider pacing. Adjust sysctls such as net.ipv4.tcp_slow_start_after_idle , net.ipv4.tcp_mtu_probing , and memory settings to match the BDP.

TLS – Choose a modern library (OpenSSL, LibreSSL, or BoringSSL) that leverages AES‑NI, AVX2/AVX‑512, or ChaCha20‑Poly1305. Prefer ECDSA over large RSA keys, enable session tickets or cache, and tune ssl_buffer_size (4 KB for latency, 16 KB for throughput). Use OCSP stapling and appropriate offload settings.

Application‑level – Optimize Nginx compression (gzip vs. brotli), buffer settings ( proxy_buffering , client_body_buffer_size ), and enable AIO or thread pools to offload disk I/O ( aio threads; aio_write on; ). Cache open files with open_file_cache and buffer logs to reduce event‑loop stalls.

Tooling – Keep perf, bcc, and other observability tools up‑to‑date. Use flame graphs, funclatency , and opensnoop to pinpoint hotspots. Upgrade compilers and system libraries (glibc, zlib‑ng) to benefit from recent micro‑optimizations.

Conclusion – Most latency gains come from low‑level server and kernel tweaks, but the biggest user‑visible improvements are achieved by higher‑level traffic engineering and load‑balancing across the edge network.

Performance OptimizationLinuxnginxWeb ServerTLSnetwork latencycpu tuning
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.