Operations 17 min read

Master Linux Observability: Quick Guide to BCC Tools for Performance Debugging

This tutorial introduces the BPF Compiler Collection (BCC) suite, explains how to install it, lists essential Linux commands, and provides step‑by‑step examples of each BCC tool for fast performance analysis, fault isolation, and network troubleshooting on Linux systems.

Big Data Technology Tribe
Big Data Technology Tribe
Big Data Technology Tribe
Master Linux Observability: Quick Guide to BCC Tools for Performance Debugging

Observability

In the previous article we introduced the revolutionary eBPF technology in Linux. Writing raw eBPF programs is complex, so developers created the BPF Compiler Collection (BCC) toolkit to let us stand on the shoulders of giants.

BCC provides many useful tools and examples for efficient kernel tracing and program manipulation. This article gives an overall guide on using BCC tools to quickly solve performance, fault‑diagnosis, and network problems (the principles of eBPF and BCC are omitted here; they will be covered later).

The tutorial assumes BCC is already installed and that tools such as

execsnoop

run successfully. For installation instructions, refer to the previous article (or the Lima article for macOS).

0. Before Using BCC

Before using BCC you should be familiar with basic Linux commands. The following commands are essential; if you are unsure of their meaning, ask ChatGPT.

<code>uptime
dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1
top</code>

1. General Performance Analysis

Below is a checklist of BCC tools for performance inspection. These tools are located in the

tools

directory of the BCC git repository.

1.1 execsnoop

execsnoop

prints a line for each new process. It helps identify short‑lived processes that may consume CPU but are invisible to most periodic monitoring tools such as

top

. It traces

exec()

rather than

fork()

, so it captures many new processes but not those that only fork.

<code># ./execsnoop
PCOMM           PID   RET ARGS
supervise       96600 ./run
supervise       96610 ./run
mkdir           96620 /bin/mkdir -p ./main
run             96630 ./run
[...]</code>

1.2 opensnoop

opensnoop

prints a line for each

open()

system call, showing detailed information. The opened files reveal how an application works (data files, config files, logs, etc.). Frequent attempts to open non‑existent files can cause performance degradation.

<code># ./opensnoop
PID   COMM          FD ERR PATH
1565  redis-server  50 /proc/1565/stat
1603  snmpd         90 /proc/net/dev
1603  snmpd        110 /proc/net/if_inet6
[...]</code>

1.3 ext4slower (or btrfsslower, xfsslower, zfsslower)

ext4slower

traces ext4 file‑system operations and times them, printing only operations that exceed a threshold. It is useful for identifying slow disk I/O at the file‑system layer, which is hard to correlate with application‑level latency. Similar tools exist for other file systems, and

fileslower

traces all VFS operations (with higher overhead).

<code># ./ext4slower
Tracing ext4 operations slower than 10 ms
TIME   COMM   PID  T BYTES OFF_KB LAT(ms) FILENAME
06:35:01 cron 16464 R 1249016.05 common-auth
06:35:01 cron 16463 R 1249016.04 common-auth
06:35:01 cron 16465 R 1249016.03 common-auth
06:35:01 cron 16465 R 4096010.62 login.defs
[...]</code>

1.4 biolatency

biolatency

tracks disk I/O latency (time from issue to completion) and prints a histogram when the tool exits (Ctrl‑C or a timeout). It reveals the distribution of latency, exposing outliers and multimodal patterns that average‑only tools like

iostat

hide.

<code># ./biolatency
Tracing block device I/O...Hit Ctrl‑C to end.
^C
    usecs : count distribution
0->1:0||
2->3:0||
4->7:0||
8->15:0||
16->31:0||
32->63:0||
64->127:1||
128->255:12|********|...
[...]</code>

1.5 biosnoop

biosnoop

outputs a line for each disk I/O, including latency. It allows detailed inspection of I/O patterns, such as reads queuing behind writes. When the system performs many I/Os, the output can be very verbose.

<code># ./biosnoop
TIME(s)   COMM   PID  DISK T SECTOR BYTES LAT(ms)
0.000004001 supervise 1950 xvda1 W 1309256040960.74
0.000178002 supervise 1950 xvda1 W 1309243240960.61
0.001469001 supervise 1956 xvda1 W 1309244040961.24
[...]</code>

1.6 cachestat

cachestat

prints a summary line every second (or custom interval) showing file‑system cache statistics. It helps identify low cache‑hit rates and provides clues for performance tuning.

<code># ./cachestat
    HITS  MISSES DIRTIES READ_HIT% WRITE_HIT% BUFFERS_MB CACHED_MB
10744 1394.9% 2.9% 1223
21951 7089.2% 5.6% 1143
[...]</code>

1.7 tcpconnect

tcpconnect

prints a line for each active TCP connection (e.g.,

connect()

), showing source and destination addresses. It helps locate unexpected connections that may indicate misconfiguration or intrusion.

<code># ./tcpconnect
PID  COMM   IP SADDR          DADDR          DPORT
1479 telnet  127.0.0.1       127.0.0.1      23
1469 curl    10.201.219.236 54.245.105.25 80
[...]</code>

1.8 tcpaccept

tcpaccept

prints a line for each passive TCP connection (e.g.,

accept()

), also showing source and destination addresses.

<code># ./tcpaccept
PID  COMM   IP RADDR          LADDR          LPORT
907  sshd   192.168.56.119  192.168.56.10  22
[...]</code>

1.9 tcpretrans

tcpretrans

prints a line for each TCP retransmission, including source/destination and kernel state. Retransmissions cause latency and throughput issues; analyzing their patterns can reveal network problems or kernel overload.

<code># ./tcpretrans
TIME   PID  IP LADDR:LPORT   T> RADDR:RPORT   STATE
01:55:05 0410.153.223.157:22 R> 69.53.245.40:34619 ESTABLISHED
[...]</code>

1.10 runqlat

runqlat

measures the time threads spend waiting on the CPU run queue and outputs a histogram, quantifying the time lost due to CPU saturation.

<code># ./runqlat
Tracing run queue latency...Hit Ctrl‑C to end.
^C
    usecs : count distribution
0->1:233|***********|
2->3:742|************************************|
4->7:203|**********|
[...]</code>

1.11 profile

profile

is a CPU sampling profiler that periodically captures stack traces and reports a summary of unique stacks with occurrence counts, helping identify code paths that consume CPU resources.

<code># ./profile
Sampling at 49 Hertz of all threads by user + kernel stack...Hit Ctrl‑C to end.
^C
00007f31d76c3251 [unknown]
- sign-file (8877)
1
    ffffffff813d0af8 __clear_user
    ffffffff813d5277 iov_iter_zero
    ...
00007f12a133e830 __libc_start_main
083e258d4c544155 [unknown]
- func_ab (13549)
5
[...]</code>

2. Observability with Generic Tools

In addition to the performance‑focused tools above, the following generic BCC utilities provide broader observability capabilities.

<code>trace
argdist
funccount</code>

2.1 trace

Example: tracing file ownership changes by monitoring the

chown

,

fchown

, and

lchown

system calls (entry point

SyS_[f|l]chown

). The command prints parameters and the invoking process's UID.

<code>$ trace.py \
'p::SyS_chown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_fchown "fd = %d, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid' \
'p::SyS_lchown "file = %s, to_uid = %d, to_gid = %d, from_uid = %d", arg1, arg2, arg3, $uid'
PID   TID   COMM   FUNC
1269255 1269 python3.6 SyS_lchown file =/tmp/dotsync-usis ...
1269441 1269 zstd    SyS_chown  file =/tmp/dotsync-vic7... 
[...]</code>

2.2 argdist

argdist

probes a specified function and aggregates argument values into a histogram or frequency count, revealing the distribution of a parameter without needing a debugger.

Example: measuring typical memory allocation sizes in an application.

<code># ./argdist -p 2420 -c -C 'p:c:malloc(size_t size):size_t:size'
[01:42:29] p:c:malloc(size_t size):size_t:size
        COUNT   EVENT
1       size =16
2       size =16
3       size =16
4       size =16
^C</code>

Another example: building a histogram of buffer sizes passed to

write()

across the whole system.

<code># ./argdist -c -H 'p:c:write(int fd, void *buf, size_t len):size_t:len'
[01:45:22] p:c:write(int fd,void*buf,size_t len):size_t:len
        len : count distribution
2->3:2|*************|
8->15:2|*************|
32->63:28|****************************************|
64->127:12|*****************|
[...]</code>

2.3 funccount

funccount

tracks functions, tracepoints, or USDT probes matching a pattern and, upon termination, prints a summary of call counts. Example: counting all kernel functions that start with

vfs_

.

<code># ./funccount 'vfs_*'
Tracing...Ctrl‑C to end.
^C
FUNC                     COUNT
vfs_create               1
vfs_rename               1
vfs_fsync_range          2
vfs_lock_file            30
vfs_fstatat              152
vfs_fstat                154
vfs_write                166
vfs_getattr_nosec       262
vfs_getattr             262
vfs_open                 264
vfs_read                 470
Detaching...</code>
eBPFPerformance DebuggingSystem TracingBCCLinux Observability
Big Data Technology Tribe
Written by

Big Data Technology Tribe

Focused on computer science and cutting‑edge tech, we distill complex knowledge into clear, actionable insights. We track tech evolution, share industry trends and deep analysis, helping you keep learning, boost your technical edge, and ride the digital wave forward.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.