How to Measure and Optimize System Performance: From Nanoseconds to Profiling
This article explains the fundamental time units of system performance, outlines various measurement dimensions and tools, and guides readers through analysis, profiling, and improvement techniques to achieve high‑performance, reliable applications.
Overview
When an application has less than ten thousand users and daily traffic below one million, the focus is on building a functional product with reliable, high‑performance code—optimizing database queries, code simplicity, and front‑end speed.
Early‑stage startups often have only a few engineers who follow coding standards, delivering core features quickly; even if the code becomes messy later, as long as the site remains usable, business can still succeed.
As traffic grows (e.g., 6 million PV per day for Ganji), teams shift to performance optimization and system refactoring, employing static pages, database sharding, load balancers, and dozens of servers.
System Performance Measurement Dimensions
Understanding time units is essential before optimizing:
ns (nanoseconds) : CPU‑level operations, cache access, branch prediction failures (~5 ns penalty).
µs (microseconds) : Memory access (e.g., 0.1 µs for main memory), local high‑speed network latency (~10‑250 µs).
ms (milliseconds) : Disk I/O (SSD sequential read ~1 ms for 1 MB), network round‑trip across continents (~150 ms).
When server response time exceeds 100 ms, network bandwidth becomes the primary bottleneck, highlighting the need for CDN for static content and multi‑IDC deployment for dynamic APIs.
How to Measure System Performance
Performance measurement can be performed at different levels:
System Level
Tools covering the entire Linux stack (as shown in the referenced diagram) include vmstat, iostat, perf, and many others. For example, vmstat 3 5 prints memory statistics every 3 seconds, five times.
VM Level
Collect metrics such as CPU utilization, GC metrics, allocator statistics, scheduler data, and I/O metrics, which may vary by language runtime.
Application Level
Gather event counts, message rates, critical function call frequencies, error rates, and response times. This often requires custom instrumentation.
System Analysis (Analyze)
System‑level analysis can be visualized with comprehensive diagrams (see image). Different runtimes have specific tools—for Erlang VM: etop , pman , observer , wombatOAM ; for applications: Datadog, Sentry, Crashlytics, Periscope, etc.
Effective visualization of collected metrics is crucial; poor visual choices can lead to misinterpretation.
Profiling
After basic measurement, profiling pinpoints time‑consuming code paths. It answers questions like “which functions dominate the call stack?” and reveals unexpected call frequencies.
Common profiling tools include gprof, fprof, systemtap, and for Erlang VM: cprof, eprof, fprof, lcnt, percept. Erlang’s profiling tools are praised for their ease of use compared to C‑based profilers.
Flame graphs provide a visual representation of profiling results; an example flame graph from an Elixir service is shown.
Improve
Once bottlenecks are identified, the majority of optimization work is defined. A slide (image) summarizes typical improvement strategies.
The article serves as an introduction to performance awareness and basic methods for enhancement.
References
Chen Tian, “Service Performance 101” (partial content)
Linux performance tools diagram
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
