How Real‑Time Log Analysis Is Revolutionizing IT Operations
This article summarizes a 2016 Global Operations conference talk that explains the concept of IT Operations Analytics (ITOA), its four data sources, the evolution of log management from databases to real‑time search engines, and real‑world case studies demonstrating how fast, large‑scale log analysis improves monitoring, security, and business insight.
Introduction
Chen Jun thanks the audience and introduces the topic of IT Operations Analysis and massive log search.
IT Operations Analysis (ITOA)
ITOA applies big‑data techniques to the massive data generated by IT operations, improving availability monitoring, application performance monitoring, root‑cause analysis, and security auditing. Gartner predicts that by 2017, 15% of large enterprises will actively use ITOA, up from 5% in 2014.
Four ITOA Data Sources
Machine data : logs from servers, network devices, and applications.
Communication data : network packet captures.
Agent data : instrumentation inserted into .NET/Java bytecode to collect function‑call and stack usage statistics.
Probe data : synthetic user requests (ICMP ping, HTTP GET) from distributed probes.
These sources are illustrated in the following diagram:
Comparison of Data Sources
Machine data (logs) is ubiquitous but varies in completeness; communication data offers full visibility but may miss non‑network events and can be encrypted; agent data provides fine‑grained code‑level metrics but is invasive and can affect stability and performance; probe data simulates end‑to‑end user experience but does not reflect real user behavior.
Log as Time‑Series Machine Data
Logs are time‑stamped records from servers, network devices, applications, and increasingly IoT sensors. They contain system information, user behavior, and business data, making them a factual view of IT systems.
Examples such as LinkedIn’s real‑time log platform and Kafka’s role in transporting logs are discussed.
Log Management Evolution
Version 1.0 used databases, 2.0 adopted Hadoop/NoSQL, and 3.0 embraces real‑time search engines (the “log 3.0” era).
LogEasy Platform Highlights
Programmable real‑time search with SPL (Search Processing Language).
Supports multiple data sources: log files, databases, binary logs via APIs.
Offers SaaS and on‑premise deployments; free tier processes 500 MB/day.
Features include search, alerting, statistical analysis, transaction correlation, and customizable parsing rules.
The platform can normalize any log format into structured data for analysis.
Case Studies
Case 1: Large Financial Institution
Before LogEasy, engineers logged into each server individually. After deployment, a private log cloud aggregates over 100 applications and 8 TB of daily logs, enabling multi‑dimensional queries, rapid incident response, and daily health reports.
Case 2: Provincial Branch of China Mobile
Using SPL, logs from multiple subsystems of a transaction are correlated to reconstruct the full transaction flow, measure latency at each step, and monitor operator efficiency.
Case 3: State Grid
LogEasy is used for security information and event management (SIEM), enabling fast forensic analysis across provincial networks.
Q&A Highlights
LogEasy’s API is open, allowing engineers to share SPL scripts.
Indexing latency for a million records is typically a few to a dozen seconds.
The platform supports both online and offline modes.
LogEasy follows a "Schema on Read" approach (like Splunk) but is adding "Schema on Write" capabilities.
Agent architecture can handle tens of thousands of agents, providing throttling, compression, encryption, and masking.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.