Operations 29 min read

Mastering ssar: A Deep Dive into Alibaba’s Open‑Source System Performance Tool

ssar is Alibaba’s open‑source system performance monitoring tool that extends traditional sar capabilities with comprehensive machine‑level, process‑level, and load metrics, offering rapid development, flexible configuration, and advanced diagnostics such as load5s, thread analysis, and custom Python query extensions for detailed OS troubleshooting.

Efficient Ops
Efficient Ops
Efficient Ops
Mastering ssar: A Deep Dive into Alibaba’s Open‑Source System Performance Tool

1. Positioning of the System Performance Analysis Tool ssar

Performance analysis tools are classified into counters, tracing, profiling, and monitoring. Based on data acquisition method and real‑time nature, tools fall into four quadrants: A – direct counter commands (top, ps); B – historical counter‑based monitoring tools (sar, tsar, atop); C – tracing and sampling tools (tracepoint, kprobe, perf); D – combined use of B and C.

The focus of this article is on quadrant B, the system performance monitoring tools.

2. Introduction to ssar

ssar is an open‑source system performance monitoring tool (source at the end of the article). It covers all functions of the classic sar tool and adds many machine‑level and process‑level metrics, including a unique load5s indicator for load diagnosis.

It is comparable to the open‑source atop tool, which is also widely deployed in the industry.

3. Rapid Development and Iteration of ssar

Traditional sar tools require code changes and long release cycles to add new metrics. ssar uses file‑based collection, allowing new metrics to be added by simply updating the sys.conf configuration and restarting the collector.

Common query commands ( -o) can specify file, line, column, metric type (c for raw, d for delta), and alias, enabling minute‑level development cycles.

ssar -o 'metric=c|cfile=meminfo|line_begin=MemFree:|column=2|alias=free'
ssar -o 'metric=d:cfile=snmp:line=8:column=13:alias=retranssegs'
ssar -o 'metric=d|cfile=stat|line=2-17|column=5|alias=idle_{line};' -f +100

Python‑based query wrappers (tsar2, ssar+) provide compatibility with the original tsar command while allowing complex logic to be implemented in Python.

4. Using ssar for Whole‑Machine Load Metrics

The load5s metric provides 5‑second‑level load information, representing the sum of R and D state threads. It is more precise than the traditional load1 metric, which lags after load disappears.

Additional load‑related metrics include runq (R‑state threads), threads (total threads), and detailed views ( load2p) that show per‑CPU and per‑process breakdowns.

5. Process‑Level Metrics with ssar

The procs subcommand displays historical process information similar to ps. Options -f, -b, and -r control time range; -o selects output fields; -H hides headers; --api outputs JSON.

Special options --job and --sched help diagnose job grouping and scheduling issues.

6. Load5s Diagnosis Example

An experiment shows that load5s rises sharply with stress‑induced load, while load1 lags, demonstrating the superiority of load5s for timely load detection.

7. CPU Usage Analysis

ssar reports CPU usage in kernel ticks (e.g., user/s). Converting ticks to percentages aligns ssar’s values with top and tsar2 metrics, enabling correlation between whole‑machine and per‑process CPU usage.

8. Memory Reclamation Case Study

The article walks through a real‑world scenario where massive Java memory allocation triggers kswapd, direct memory reclamation, high sys CPU usage, and load spikes. ssar’s detailed metrics (free memory, pgscan_kswapd/s, pgscan_direct/s, network throughput, order‑3 memory fragmentation) expose the full chain of events.

9. Configuration Files

ssar uses /etc/ssar/ssar.conf (sections [main], [load], [proc]) and /etc/ssar/sys.conf for file‑level collection. Key options include duration_threshold, load5s_flag, proc_flag, scatter_second, and load5s_threshold, which control data retention, collection toggles, and clustering behavior.

10. Resources

ssar source code: https://gitee.com/anolis/tracing-ssar.git

Video replay and documentation links are provided at the end of the original article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

LinuxOpen SourceSystem monitoringdiagnosticsPerformance Analysisssar
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.