How Milano Transforms Large-Scale Cluster Log Analysis with ELK and Kafka
Milano, a distributed log collection and analysis platform built on the ELK stack, leverages Filebeat, Kafka, Logstash, Elasticsearch, and Kibana to provide high‑throughput, low‑latency, secure, and visual log management for massive clusters, addressing the challenges of traditional manual log inspection.
Current State of Large-Scale Cluster Log Analysis
Log analysis is essential for understanding system hardware, load, and security status, and for troubleshooting. Traditional methods require logging into each node and manually inspecting logs, which becomes cumbersome and inefficient when dealing with dozens to thousands of machines. Centralized log management platforms are therefore critical for improving efficiency and reducing operational complexity.
Problem Solving
To meet this need, Xinghuan developed the massive‑scale log analysis platform Milano, offering centralized cluster log analysis services for both physical clusters and cloud‑deployed tenants. Milano’s architecture—Filebeat + Kafka + Logstash + Elasticsearch + Kibana—provides convenient query and analysis interfaces, statistical analysis, and alerting, overcoming the limitations of traditional Linux command‑based analysis in large clusters.
Milano Architecture and Features
Milano is a distributed log collection and analysis system based on ELK, delivering a unified interface for log retrieval, monitoring, and error diagnosis.
Distributed log collection.
Log distribution.
Log extraction and processing.
Log query and retrieval.
Log reporting.
Log Collection
Milano uses Filebeat as an agent on service machines to collect logs and route them to tenant‑specific Kafka topics. Filebeat is lightweight, resource‑efficient, and focuses solely on log collection, offering better performance than Logstash.
Log Distribution
Collected logs are buffered and distributed via Kafka, which smooths traffic spikes. Each tenant receives a dedicated Kafka topic for isolation, with authentication and authorization ensuring data privacy.
Log Processing
Before indexing, logs are processed by Logstash, which normalizes diverse log formats and extracts valuable information.
Log Retrieval
Milano employs an enhanced Elasticsearch‑based Search engine for distributed full‑text search and analysis, featuring hierarchical storage and off‑heap memory management to improve availability and avoid GC pauses.
Log Presentation
Kibana serves as the user interface, offering visual analytics, customizable dashboards, and easy query capabilities for daily log analysis.
Milano Feature Advantages
Advanced Log Management
Users can view logs from all cluster components via Kibana, perform text searches, and use a unified query language to quickly locate key information.
Full‑Chain High Throughput
Kafka and Elasticsearch provide high‑throughput scalability; a single Milano node can ingest up to 15,000 logs per second, and a three‑node cluster can handle up to 1 billion logs daily, with ingestion latency under 10 seconds.
Full‑Chain Security Control
Data transmission is encrypted with Kerberos, and each tenant’s logs are isolated in dedicated Kafka topics and Elasticsearch indices. Kibana access is secured via LDAP authentication.
Full‑Chain High Availability
Kafka, Elasticsearch, and Filebeat include health monitoring, and all modules support multi‑active replicas to ensure data redundancy and rapid recovery without loss or duplication.
Conclusion and Outlook
Milano addresses the demand for efficient, scalable log analysis in large‑scale clusters, offering high throughput, low latency, extensible architecture, and robust security. Future work includes integrating artificial intelligence for advanced text processing and anomaly detection, further enhancing automated, intelligent log analytics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
