Operations 12 min read

Essential Linux Operations Metrics for Effective Monitoring

This guide enumerates the key Linux system metrics—covering CPU, memory, disk, I/O, network, kernel parameters, RAID, SMART, NTP, and process information—that open-falcon agents collect every minute to enable comprehensive operations monitoring and timely issue detection.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Essential Linux Operations Metrics for Effective Monitoring

1. Linux Operations Basic Collection Items

Effective monitoring requires collecting as many relevant metrics as possible; the following categories are commonly used by engineers.

CPU

Load

Memory

Disk

IO

Network

Kernel parameters

ss statistics

Port collection

Core service process health

Critical business process resource consumption

NTP offset

DNS resolution

All these metrics are directly supported by the open-falcon agent, which collects them every 60 seconds.

2. CPU Metrics

cpu.idle – percentage of idle time without outstanding disk I/O.

cpu.busy – 100 minus cpu.idle.

cpu.guest – time spent running virtual processors.

cpu.iowait – idle time while disk I/O is pending.

cpu.irq – time servicing hardware interrupts.

cpu.softirq – time servicing software interrupts.

cpu.nice – CPU utilization at nice priority.

cpu.steal – involuntary wait time for virtual CPUs.

cpu.system – CPU usage in kernel mode.

cpu.user – CPU usage in user mode.

cpu.cnt – number of CPU cores.

cpu.switches – number of context switches.

3. Disk Metrics

Metrics are derived from /proc/mounts and syscall.Statfs_t, with tags such as mount and fstype.

df.bytes.free – free bytes (int64).

df.bytes.free.percent – free space percentage (float64).

df.bytes.total – total bytes (int64).

df.bytes.used – used bytes (int64).

df.bytes.used.percent – used space percentage (float64).

df.inodes.total – total inodes (int64).

df.inodes.free – free inodes (int64).

df.inodes.free.percent – free inode percentage (float64).

df.inodes.used – used inodes (int64).

df.inodes.used.percent – used inode percentage (float64).

4. megacli RAID Metrics

sys.disk.lsiraid.pd.Media_Error_Count

sys.disk.lsiraid.pd.Other_Error_Count

sys.disk.lsiraid.pd.Predictive_Failure_Count

sys.disk.lsiraid.pd.Drive_Temperature

sys.disk.lsiraid.pd.Firmware_state – non-zero indicates a problem.

sys.disk.lsiraid.vd.cache_policy – non-zero indicates cache mismatch.

sys.disk.lsiraid.vd.state – non-zero indicates logical disk issue.

5. SMART Metrics

sys.disk.smart.Reallocated_Sector_Ct

sys.disk.smart.Spin_Retry_Count

sys.disk.smart.Reallocated_Event_Count

sys.disk.smart.Current_Pending_Sector

sys.disk.smart.Offline_Uncorrectable

sys.disk.smart.Temperature_Celsius

6. Partition Read/Write Monitoring

sys.disk.rw – non-zero indicates read/write issue on the partition.

7. IO Metrics

disk.io.ios_in_progress – current I/O requests.

disk.io.msec_read – total milliseconds spent reading.

disk.io.msec_write – total milliseconds spent writing.

disk.io.msec_total – time with at least one I/O in flight.

disk.io.msec_weighted_total – weighted I/O time.

disk.io.read_merged, disk.io.write_merged – merged requests.

disk.io.read_requests, disk.io.write_requests – number of reads/writes.

disk.io.read_sectors, disk.io.write_sectors – sectors transferred.

disk.io.read_bytes, disk.io.write_bytes – bytes transferred.

disk.io.avgrq_sz, disk.io.avgqu-sz, disk.io.await, disk.io.svctm, disk.io.util – standard iostat metrics.

8. Load Metrics

load.1min

load.5min

load.15min

9. Memory Metrics

mem.memtotal – total memory.

mem.memused – used memory.

mem.memused.percent – used memory percentage.

mem.memfree – free memory.

mem.memfree.percent – free memory percentage.

mem.swaptotal, mem.swapused, mem.swapused.percent, mem.swapfree, mem.swapfree.percent – swap statistics.

10. Network Metrics

Collected from /proc/net/dev with iface tag.

net.if.in.bytes, net.if.out.bytes, net.if.total.bytes, etc.

Various error, drop, packet, and compressed counters for inbound and outbound traffic.

11. Port Monitoring

net.port.listen – 1 if the port is listening, 0 otherwise.

12. Kernel Configuration

kernel.maxfiles – /proc/sys/fs/file-max.

kernel.files.allocated – first field of /proc/sys/fs/file-nr.

kernel.files.left – maxfiles minus allocated.

kernel.maxproc – /proc/sys/kernel/pid_max.

13. NTP Offset

sys.ntp.offset – offset in ms; large or zero values indicate anomalies.

14. Process Monitoring

proc.num – number of processes matching name or cmdline.

15. Process Resource Metrics

process.cpu.all, process.cpu.sys, process.cpu.user – CPU usage in jiffies.

process.swap – swap usage in pages.

process.fd – file descriptor count.

process.mem – memory usage in bytes.

16. ss Command Output

ss.orphaned, ss.closed, ss.timewait, ss.slabinfo.timewait, ss.synrecv, ss.estab.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OperationsMetricsPerformance Monitoringsystem performanceOpen-Falcon
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.