Building a 3-Dimensional Automated Visual Monitoring System for High-Availability
The article describes a three-dimensional, automated, visual monitoring approach for high-availability systems, detailing a five-layer monitoring model, automated log collection using Logstash-Redis-Elasticsearch, and visualization techniques that together reduce fault-locating time and improve operational efficiency.
Content organized from "Business-oriented 3D high-availability architecture design" by Li Yunhua, senior engineer at Alibaba.
Scenario
Customer reports issues, prompting R&D, testing, and operations to locate and analyze the problem. R&D A manually checks massive logs on many machines; R&D B suspects the database but cannot access it; Operations C must handle CPU, memory, I/O, network, and program status across many servers.
Solution
A “three-dimensional, automated, visual monitoring” system is proposed, consisting of three pillars:
1. Three-dimensional monitoring
All information needed for fault analysis and location is monitored across five layers:
Business layer : collects and analyses metrics such as traffic volume and success rate, revealing spikes during attacks.
Application service layer : monitors per-URI traffic, HTTP status distribution, response times, etc.
Interface call layer : tracks external system calls, including latency, error codes, and call counts, enabling quick identification of upstream failures.
Underlying component layer : monitors containers, databases, caches, message queues; each component provides specific metrics (e.g., MySQL connections, query counts, cache hit rate).
Infrastructure layer : monitors OS and network status, such as CPU usage, memory usage, network traffic, and connection counts.
2. Automation
Eliminate manual log inspection and command execution by automating data collection and analysis. When a fault occurs, required information is instantly available, reducing diagnosis time. The system uses Logstash to collect logs, Redis to cache them, and Elasticsearch to store and analyze them.
3. Visualization
Present fault-related data through charts and numbers for intuitive understanding. With automated collection as a foundation, visualization simply renders data into graphs, also displaying comparative metrics such as month-over-month or year-over-year trends to quickly pinpoint issues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Java High-Performance Architecture
Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
