Unlocking New Insights: How Big Data Transforms IT Monitoring
This article explores how viewing monitoring through a big‑data lens reveals diverse data sources—machine, log, network, probe, agent, and user behavior—and shows how combining them can build more effective AIOps‑driven operations solutions.
Overview
Operations professionals are familiar with monitoring, but this article examines monitoring from a big‑data perspective to uncover new findings.
Viewing monitoring as a data pipeline—collection, analysis, and visualization—highlights its similarity to typical big‑data workflows.
Monitoring data includes not only server metrics but also logs, network traffic, probe results, agent data, and user behavior, all characterized by large volume, variety, and high timeliness, matching core big‑data traits.
When Gartner introduced AIOps in 2016, the emphasis on massive data for algorithmic processing underscored the importance of data in AI‑driven operations.
Data Source Classification
Common monitoring data sources can be grouped as follows:
Machine data – high usefulness, low collection difficulty.
Log data – high usefulness, moderate collection difficulty.
Network communication data – low usefulness, high collection difficulty.
Probe data – high usefulness, low collection difficulty.
Agent data – moderate usefulness, high collection difficulty.
User behavior data – moderate usefulness, moderate collection difficulty.
These indices are for reference only.
Machine Data
Machine data originates from hardware or virtual devices (servers, network equipment) via protocols such as SNMP, IPMI, or WMI, providing status metrics like CPU, memory, disk, and network traffic. It is the most common monitoring source and is supported by tools like Zabbix and Nagios.
Even in cloud environments, virtual resources generate machine data that must be monitored (e.g., cloud VM resource usage, bandwidth, gateway load).
While essential, machine data alone cannot reflect business health; additional data sources are needed.
Log Data
Log data consists of text records generated by applications, middleware, and systems during operation. It is versatile and supports use cases from business metric analysis to bug tracing.
Effective log monitoring requires clear objectives and standardized log formats; the ELK Stack is a widely adopted open‑source solution.
Network Communication Data
Captured via packet sniffing, network communication data reveals detailed information about inter‑server traffic, including ports, protocols, and payloads, without needing prior logging.
Although rich, this data is underutilized due to organizational silos between network and application teams, and challenges such as encrypted traffic and the need for deep protocol knowledge.
Open‑source tools are limited; commercial products like Gartner‑listed NPMD solutions are commonly used.
Probe Data
Probe data is generated by active checks (HTTP, Ping, TCP) from monitoring points. Historically derived from telephone testing, it now monitors website health, network exit quality, CDN performance, and internal service availability.
Internal probes are easy to deploy with a few scripts, while public probes require substantial infrastructure and cost, often addressed by commercial services such as Baidu Cloud Monitor.
Agent Data
Agent data is collected via bytecode instrumentation or runtime hooks, capturing application‑level metrics without modifying source code. It powers Application Performance Management (APM) solutions, providing transaction traces, database calls, and execution timings. Open‑source examples include Pinpoint for Java.
User Behavior Data
User behavior data is gathered through front‑end instrumentation (JS on web pages, SDKs in apps) to track page visits, interactions, and component usage.
Beyond product analytics, this data helps operations teams pinpoint performance bottlenecks affecting end users, enabling rapid, targeted interventions.
Conclusion
By integrating diverse data sources—machine, log, network, probe, agent, and user behavior—organizations can design monitoring solutions tailored to their specific business needs and uncover previously overlooked insights.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.