Operations 14 min read

Essential Linux Monitoring Metrics for Open‑Falcon: A Complete Guide

This article enumerates the core Linux system metrics collected by the Open‑Falcon agent—including CPU, disk, memory, network, kernel, and process statistics—explaining how each metric is derived from /proc or other system tools and why it matters for reliable operations monitoring.

ITPUB

Sep 30, 2020

Essential Linux Monitoring Metrics for Open‑Falcon: A Complete Guide

1. Basic Linux Monitoring Items

Effective operations rely on a robust monitoring system that captures as many relevant metrics as possible; the following list reflects practical experience from seasoned engineers.

2. CPU Metrics

cpu.idle – Percentage of time the CPU(s) were idle without outstanding disk I/O.

cpu.busy – 100 minus cpu.idle.

cpu.guest – Percentage of time spent running a virtual processor.

cpu.iowait – Percentage of idle time while the system had outstanding disk I/O.

cpu.irq – Percentage of time servicing hardware interrupts.

cpu.softirq – Percentage of time servicing software interrupts.

cpu.nice – Percentage of CPU utilization at user level with nice priority.

cpu.steal – Percentage of involuntary wait time for virtual CPUs.

cpu.system – Percentage of CPU utilization at the kernel level.

cpu.user – Percentage of CPU utilization at the application level.

cpu.cnt – Number of CPU cores.

cpu.switches – Number of context switches (counter).

3. Disk Metrics

Metrics are derived by reading /proc/mounts for mount points and using syscall.Statfs_t to obtain block and inode usage; each metric includes tags such as mount=$mount and fstype=$fstype.

df.bytes.free – Free disk space (int64).

df.bytes.free.percent – Free space as a percentage (float64).

df.bytes.total – Total disk size (int64).

df.bytes.used – Used disk space (int64).

df.bytes.used.percent – Used space as a percentage (float64).

df.inodes.total – Total inode count (int64).

df.inodes.free – Free inode count (int64).

df.inodes.free.percent – Free inode percentage (float64).

df.inodes.used – Used inode count (int64).

df.inodes.used.percent – Used inode percentage (float64).

4. megacli RAID Metrics

Metrics obtained via the megacli tool include tags like PD=Enclosure_ID:SLOT_ID or VD=0 to identify physical or virtual disks.

sys.disk.lsiraid.pd.Media_Error_Count – Indicates increased risk of disk failure.

sys.disk.lsiraid.pd.Other_Error_Count

sys.disk.lsiraid.pd.Predictive_Failure_Count

sys.disk.lsiraid.pd.Drive_Temperature

sys.disk.lsiraid.pd.Firmware_state – Non‑zero value signals a problem.

sys.disk.lsiraid.vd.cache_policy – Non‑zero value indicates cache policy mismatch.

sys.disk.lsiraid.vd.state – Non‑zero value signals a problem with the logical disk.

5. SMART Disk Metrics

Collected with smartctl; each metric is tagged with the device name (e.g., device=/dev/sda).

sys.disk.smart.Reallocated_Sector_Ct

sys.disk.smart.Spin_Retry_Count

sys.disk.smart.Reallocated_Event_Count

sys.disk.smart.Current_Pending_Sector

sys.disk.smart.Offline_Uncorrectable

sys.disk.smart.Temperature_Celsius

6. Partition Read/Write Monitoring

sys.disk.rw – Non‑zero value indicates read/write issues on the partition (tagged with mount=$mount).

7. IO Metrics

Collected every second from /proc/diskstats and calculated as counters.

disk.io.ios_in_progress – Number of I/O requests currently in flight.

disk.io.msec_read – Total milliseconds spent on reads.

disk.io.msec_total – Time during which ios_in_progress >= 1.

disk.io.msec_weighted_total – Weighted I/O time.

disk.io.msec_write – Total milliseconds spent on writes.

disk.io.read_merged – Number of merged read requests.

disk.io.read_requests – Total successful reads.

disk.io.read_sectors – Total sectors read.

disk.io.write_merged – Number of merged write requests.

disk.io.write_requests – Total successful writes.

disk.io.write_sectors – Total sectors written.

disk.io.read_bytes – Bytes read.

disk.io.write_bytes – Bytes written.

disk.io.avgrq_sz – Average request size (as shown by iostat -x 1).

disk.io.avgqu-sz – Average queue length.

disk.io.await – Average wait time.

disk.io.svctm – Service time.

disk.io.util – Utilization percentage (e.g., 56.43%).

8. Load Average Metrics

load.1min – 1‑minute load average.

load.5min – 5‑minute load average.

load.15min – 15‑minute load average.

9. Memory Metrics

Derived from /proc/meminfo; mem.memfree equals free + buffers + cached.

mem.memtotal – Total memory.

mem.memused – Used memory.

mem.memused.percent – Used memory percentage.

mem.memfree – Free memory.

mem.memfree.percent – Free memory percentage.

mem.swaptotal – Total swap.

mem.swapused – Used swap.

mem.swapused.percent – Used swap percentage.

mem.swapfree – Free swap.

mem.swapfree.percent – Free swap percentage.

10. Network Metrics

Collected from /proc/net/dev; each metric is tagged with iface=$iface (e.g., eth0). Metrics with “in” refer to inbound traffic, “out” to outbound, and “total” to the sum.

net.if.in.bytes, net.if.in.compressed, net.if.in.dropped, net.if.in.errors, net.if.in.fifo.errs, net.if.in.frame.errs, net.if.in.multicast, net.if.in.packets

net.if.out.bytes, net.if.out.carrier.errs, net.if.out.collisions, net.if.out.compressed, net.if.out.dropped, net.if.out.errors, net.if.out.fifo.errs, net.if.out.packets

net.if.total.bytes, net.if.total.dropped, net.if.total.errors, net.if.total.packets

11. Port Monitoring

Uses ss -ln to determine if a port is listening (1) or not (0); tagged with port=$port.

net.port.listen

12. Kernel Configuration

kernel.maxfiles – Value from /proc/sys/fs/file-max.

kernel.files.allocated – First field of /proc/sys/fs/file-nr.

kernel.files.left – Calculated as kernel.maxfiles - kernel.files.allocated.

kernel.maxproc – Value from /proc/sys/kernel/pid_max.

13. NTP Offset

Obtained with ntpq -pn.

sys.ntp.offset – Machine offset in milliseconds; large or zero values indicate anomalies.

14. Process Count Monitoring

proc.num – Counts processes either by name (e.g., name=sshd) or by full command line (e.g., cmdline=./falcon_agent-c./cfg.ini).

15. Process Resource Metrics

process.cpu.all – CPU time (sys + user) for a process and its children, in jiffies.

process.cpu.sys – System CPU time for a process and its children, in jiffies.

process.cpu.user – User CPU time for a process and its children, in jiffies.

process.swap – Swap usage for a process and its children, in pages.

process.fd – Number of file descriptors used.

process.mem – Memory usage of the process, in bytes.

16. ss Command Output Metrics

ss.orphaned

ss.closed

ss.timewait

ss.slabinfo.timewait

ss.synrecv

ss.estab

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Metrics Linux Open-Falcon

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.