Operations 5 min read

Building a 3-Dimensional Automated Visual Monitoring System for High-Availability

The article describes a three-dimensional, automated, visual monitoring approach for high-availability systems, detailing a five-layer monitoring model, automated log collection using Logstash-Redis-Elasticsearch, and visualization techniques that together reduce fault-locating time and improve operational efficiency.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
Building a 3-Dimensional Automated Visual Monitoring System for High-Availability

Content organized from "Business-oriented 3D high-availability architecture design" by Li Yunhua, senior engineer at Alibaba.

Scenario

Customer reports issues, prompting R&D, testing, and operations to locate and analyze the problem. R&D A manually checks massive logs on many machines; R&D B suspects the database but cannot access it; Operations C must handle CPU, memory, I/O, network, and program status across many servers.

Solution

A “three-dimensional, automated, visual monitoring” system is proposed, consisting of three pillars:

1. Three-dimensional monitoring

All information needed for fault analysis and location is monitored across five layers:

Business layer : collects and analyses metrics such as traffic volume and success rate, revealing spikes during attacks.

Application service layer : monitors per-URI traffic, HTTP status distribution, response times, etc.

Interface call layer : tracks external system calls, including latency, error codes, and call counts, enabling quick identification of upstream failures.

Underlying component layer : monitors containers, databases, caches, message queues; each component provides specific metrics (e.g., MySQL connections, query counts, cache hit rate).

Infrastructure layer : monitors OS and network status, such as CPU usage, memory usage, network traffic, and connection counts.

2. Automation

Eliminate manual log inspection and command execution by automating data collection and analysis. When a fault occurs, required information is instantly available, reducing diagnosis time. The system uses Logstash to collect logs, Redis to cache them, and Elasticsearch to store and analyze them.

3. Visualization

Present fault-related data through charts and numbers for intuitive understanding. With automated collection as a foundation, visualization simply renders data into graphs, also displaying comparative metrics such as month-over-month or year-over-year trends to quickly pinpoint issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringAutomationOperationsSystem Designvisualization
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.