How ByteBrain-LogParser Achieves 1‑2 Orders Faster Log Parsing in Cloud Services
ByteBrain-LogParser is a cloud‑native log‑parsing framework that transforms unstructured logs into dynamic templates with real‑time precision control, delivering parsing speeds up to two orders of magnitude faster than state‑of‑the‑art methods while maintaining near‑SOTA accuracy and low storage overhead.
Introduction
ByteBrain-LogParser is a log‑parsing framework designed for cloud‑service environments, capable of converting unstructured log text into dynamic templates and variables while addressing four major challenges: adaptability, computational efficiency, storage efficiency, and parsing accuracy.
Key Innovations
The system uses a position‑similarity distance metric, a saturation‑score mechanism, and deterministic 64‑bit hash encoding to achieve parsing speeds 10‑100× faster than state‑of‑the‑art methods with comparable accuracy. Hierarchical clustering builds a template tree that supports real‑time precision adjustment without re‑processing raw logs.
Offline training samples logs, performs tokenization, optional variable regex replacement, deduplication, and hash encoding, then applies bottom‑up hierarchical clustering using length and prefix initial grouping. Online matching preprocesses incoming logs identically and selects the most appropriate template based on a user‑specified saturation threshold.
Optimizations
ByteBrain‑LogParser employs balanced grouping, early stopping, parallel processing, and hash‑based storage to minimise memory usage and computational overhead, keeping the model size in a few megabytes even for hundreds of megabytes per second of log streams.
Experimental Evaluation
Evaluated on LogHub and LogHub‑2.0 datasets, ByteBrain‑LogParser achieves an average grouping accuracy of 0.98 on LogHub and 0.90 on LogHub‑2.0, surpassing most baselines. Throughput reaches up to 229 000 logs / s (≈ 840 % faster than the fastest baseline), with linear scaling to larger log volumes.
Conclusion
The framework delivers high‑accuracy, high‑throughput log parsing with real‑time precision control, making it a practical solution for monitoring, anomaly detection, and root‑cause analysis in large‑scale cloud environments.
Volcano Engine Developer Services
The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
