Mastering ssar: A Deep Dive into Alibaba’s Open‑Source System Performance Tool
ssar is Alibaba’s open‑source system performance monitoring tool that extends traditional sar capabilities with comprehensive machine‑level, process‑level, and load metrics, offering rapid development, flexible configuration, and advanced diagnostics such as load5s, thread analysis, and custom Python query extensions for detailed OS troubleshooting.
1. Positioning of the System Performance Analysis Tool ssar
Performance analysis tools are classified into counters, tracing, profiling, and monitoring. Based on data acquisition method and real‑time nature, tools fall into four quadrants: A – direct counter commands (top, ps); B – historical counter‑based monitoring tools (sar, tsar, atop); C – tracing and sampling tools (tracepoint, kprobe, perf); D – combined use of B and C.
The focus of this article is on quadrant B, the system performance monitoring tools.
2. Introduction to ssar
ssar is an open‑source system performance monitoring tool (source at the end of the article). It covers all functions of the classic sar tool and adds many machine‑level and process‑level metrics, including a unique load5s indicator for load diagnosis.
It is comparable to the open‑source atop tool, which is also widely deployed in the industry.
3. Rapid Development and Iteration of ssar
Traditional sar tools require code changes and long release cycles to add new metrics. ssar uses file‑based collection, allowing new metrics to be added by simply updating the sys.conf configuration and restarting the collector.
Common query commands ( -o) can specify file, line, column, metric type (c for raw, d for delta), and alias, enabling minute‑level development cycles.
ssar -o 'metric=c|cfile=meminfo|line_begin=MemFree:|column=2|alias=free' ssar -o 'metric=d:cfile=snmp:line=8:column=13:alias=retranssegs' ssar -o 'metric=d|cfile=stat|line=2-17|column=5|alias=idle_{line};' -f +100Python‑based query wrappers (tsar2, ssar+) provide compatibility with the original tsar command while allowing complex logic to be implemented in Python.
4. Using ssar for Whole‑Machine Load Metrics
The load5s metric provides 5‑second‑level load information, representing the sum of R and D state threads. It is more precise than the traditional load1 metric, which lags after load disappears.
Additional load‑related metrics include runq (R‑state threads), threads (total threads), and detailed views ( load2p) that show per‑CPU and per‑process breakdowns.
5. Process‑Level Metrics with ssar
The procs subcommand displays historical process information similar to ps. Options -f, -b, and -r control time range; -o selects output fields; -H hides headers; --api outputs JSON.
Special options --job and --sched help diagnose job grouping and scheduling issues.
6. Load5s Diagnosis Example
An experiment shows that load5s rises sharply with stress‑induced load, while load1 lags, demonstrating the superiority of load5s for timely load detection.
7. CPU Usage Analysis
ssar reports CPU usage in kernel ticks (e.g., user/s). Converting ticks to percentages aligns ssar’s values with top and tsar2 metrics, enabling correlation between whole‑machine and per‑process CPU usage.
8. Memory Reclamation Case Study
The article walks through a real‑world scenario where massive Java memory allocation triggers kswapd, direct memory reclamation, high sys CPU usage, and load spikes. ssar’s detailed metrics (free memory, pgscan_kswapd/s, pgscan_direct/s, network throughput, order‑3 memory fragmentation) expose the full chain of events.
9. Configuration Files
ssar uses /etc/ssar/ssar.conf (sections [main], [load], [proc]) and /etc/ssar/sys.conf for file‑level collection. Key options include duration_threshold, load5s_flag, proc_flag, scatter_second, and load5s_threshold, which control data retention, collection toggles, and clustering behavior.
10. Resources
ssar source code: https://gitee.com/anolis/tracing-ssar.git
Video replay and documentation links are provided at the end of the original article.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
