Topic

Monitoring

Collection size
1711 articles
Page 79 of 86
Efficient Ops
Efficient Ops
Aug 19, 2020 · Operations

How End-State‑Oriented Monitoring Transforms Operations and AIOps

This article explains the concept of end‑state‑oriented monitoring, its significance for modern operations, the shortcomings of existing solutions, and a layered design approach that leverages real‑time data, service catalogs, and AI to achieve secure, stable, efficient, and low‑cost operations.

AIOpsDevOpsMonitoring
0 likes · 13 min read
How End-State‑Oriented Monitoring Transforms Operations and AIOps
Efficient Ops
Efficient Ops
Jun 3, 2020 · Operations

Understanding Kubernetes vs VM Monitoring: CPU, Memory, Disk & Network

This article compares monitoring metrics for CPU, memory, disk, and network between traditional KVM-based servers and Kubernetes pods, explaining why their indicators differ, how resource isolation works, and what key metrics users should watch to diagnose performance bottlenecks.

CPUKubernetesMemory
0 likes · 11 min read
Understanding Kubernetes vs VM Monitoring: CPU, Memory, Disk & Network
Efficient Ops
Efficient Ops
May 19, 2020 · Cloud Native

Mastering Prometheus on Kubernetes: Practical Tips, Exporter Guide, and Capacity Planning

This article explores the history and principles of Prometheus monitoring, offers guidance on version selection, highlights its limitations, details common Kubernetes exporters, shows Grafana dashboard setups, and provides in‑depth strategies for exporter aggregation, golden metrics, multi‑cluster scraping, GPU monitoring, timezone handling, memory optimization, capacity planning, and rate calculations.

Capacity PlanningExporterKubernetes
0 likes · 19 min read
Mastering Prometheus on Kubernetes: Practical Tips, Exporter Guide, and Capacity Planning
Efficient Ops
Efficient Ops
Apr 6, 2020 · Databases

How to Build a MySQL Monitoring Platform with Prometheus and Grafana

This article walks through setting up a production‑grade MySQL monitoring solution using Prometheus and Grafana, covering exporter installation, MySQL user configuration, systemd service setup, Prometheus job definition, key MySQL performance metrics, and basic alerting rules.

ExporterMonitoringMySQL
0 likes · 15 min read
How to Build a MySQL Monitoring Platform with Prometheus and Grafana
Efficient Ops
Efficient Ops
Mar 16, 2020 · Cloud Native

Designing a Scalable, High‑Availability Kubernetes Monitoring Solution at Xiaomi

This article details Xiaomi's implementation of a highly available, persistent, and dynamically scalable Kubernetes monitoring system, covering challenges, architecture choices, Prometheus federation, performance testing, and future enhancements for cloud‑native observability.

Cloud NativeKubernetesMonitoring
0 likes · 18 min read
Designing a Scalable, High‑Availability Kubernetes Monitoring Solution at Xiaomi
Efficient Ops
Efficient Ops
Mar 11, 2020 · Operations

How to Elevate Your Monitoring System: Proven Practices from Top DevOps Models

This article explains why modern services depend on highly available, scalable monitoring, outlines a systematic way to assess and improve monitoring capabilities using open‑source tools and the DevOps Capability Maturity Model, and details concrete improvement points across data collection, management, and application.

DevOpsMonitoringOperations
0 likes · 9 min read
How to Elevate Your Monitoring System: Proven Practices from Top DevOps Models
Efficient Ops
Efficient Ops
Mar 8, 2020 · Operations

Prometheus vs Zabbix: Install, Configure & Visualize with Grafana

This article compares Prometheus with Zabbix, walks through downloading and installing Prometheus, explains the key sections of prometheus.yml, shows how to add a node_exporter for machine metrics, and demonstrates integrating Grafana to create rich monitoring dashboards.

ExporterLinuxMonitoring
0 likes · 11 min read
Prometheus vs Zabbix: Install, Configure & Visualize with Grafana
Efficient Ops
Efficient Ops
Mar 4, 2020 · Operations

Master Zabbix: From Installation to Advanced Custom Monitoring

This guide explains why monitoring is essential, describes the concept of availability "X nines," walks through Zabbix installation, web interface setup, host and template configuration, custom monitoring, alerting with OneAlert, visualization, distributed monitoring, SNMP integration, and provides practical command examples for managing large server fleets.

AutomationLinuxMonitoring
0 likes · 20 min read
Master Zabbix: From Installation to Advanced Custom Monitoring
Efficient Ops
Efficient Ops
Feb 24, 2020 · Operations

How to Build an Effective Operations Monitoring Platform: Tools, Design, and Best Practices

This article explains why monitoring is essential for operations, reviews popular monitoring tools such as Cacti, Nagios, Zabbix, Ganglia, Centreon, Prometheus and Grafana, outlines a six‑layer unified monitoring platform architecture, offers selection guidance for different enterprise sizes, and shares evolution lessons from small to large scale deployments.

DevOpsMonitoringOperations
0 likes · 20 min read
How to Build an Effective Operations Monitoring Platform: Tools, Design, and Best Practices
Efficient Ops
Efficient Ops
Feb 17, 2020 · Operations

How Top IT Ops Teams Ensure Seamless Large-Scale Business Events

This article outlines how Ping An’s IT operations team systematically prepares for high‑traffic business events—detailing service assessment, architecture mapping, configuration audits, monitoring design, capacity planning, stress testing, and coordinated incident response—to guarantee reliability and performance under massive concurrent loads.

Capacity PlanningIT operationsIncident Response
0 likes · 15 min read
How Top IT Ops Teams Ensure Seamless Large-Scale Business Events
Efficient Ops
Efficient Ops
Jan 7, 2020 · Operations

How 5G Is Driving the Next Generation of Operations Platforms

The article explores how the advent of 5G reshapes operational demands, introduces a technical operations platform that unifies business and network domains, details practical implementations such as CMDB, monitoring, automation, and AIOps, and outlines future directions for intelligent, agile operations.

5GAIOpsCMDB
0 likes · 18 min read
How 5G Is Driving the Next Generation of Operations Platforms
Efficient Ops
Efficient Ops
Dec 22, 2019 · Operations

How Baidu’s Noah Monitoring System Tackles AIOps Challenges at Scale

This article examines Baidu’s Noah monitoring and alarm platform, detailing its end‑to‑end fault‑handling workflow, the three‑component architecture, and the practical challenges of deploying AIOps—such as long algorithm iteration cycles, complex alarm management, and alarm storms—while highlighting scalability and commercial considerations.

AIOpsAlarm ManagementMonitoring
0 likes · 15 min read
How Baidu’s Noah Monitoring System Tackles AIOps Challenges at Scale
Efficient Ops
Efficient Ops
Nov 28, 2019 · Operations

Master Modern IT Operations: Skill Maps, ELK Architectures & Big Data Monitoring

This article explores the evolving landscape of IT operations, detailing role specializations, comprehensive skill maps for system, web, big data, and container ops, and compares three ELK logging architectures while emphasizing a data‑driven approach to monitoring and incident response.

Big DataELKIT operations
0 likes · 11 min read
Master Modern IT Operations: Skill Maps, ELK Architectures & Big Data Monitoring
Efficient Ops
Efficient Ops
Nov 7, 2019 · Operations

How BigBrother Revolutionizes Large‑Scale Virtual Network Connectivity Checks

BigBrother is a TCP‑based, full‑link, large‑scale network connectivity detection system that uses packet coloring and GRE mirroring to automatically locate virtual network faults across public, hybrid, and physical clouds, dramatically reducing troubleshooting time and supporting high‑concurrency tasks.

BigBrotherMonitoringTroubleshooting
0 likes · 16 min read
How BigBrother Revolutionizes Large‑Scale Virtual Network Connectivity Checks
Efficient Ops
Efficient Ops
Oct 14, 2019 · Operations

How AIOps Transforms IT Operations: Real-World Architecture and Lessons

This article shares a practical case study of implementing AIOps in an online‑education company, covering the background pain points of massive monitoring data, the designed architecture with real‑time processing and machine‑learning pipelines, and the challenges and opportunities of intelligent operations.

AIOpsBig DataIT operations
0 likes · 14 min read
How AIOps Transforms IT Operations: Real-World Architecture and Lessons
Efficient Ops
Efficient Ops
Aug 21, 2019 · Operations

How Meituan‑Dianping Scales Real‑Time Monitoring for Trillions of Events with CAT

This article explains how Meituan‑Dianping built the CAT platform to provide both user‑side and server‑side real‑time monitoring at trillion‑event scale, detailing its metrics, architecture evolution, storage strategies, and open‑source contributions.

MonitoringOperationsarchitecture
0 likes · 10 min read
How Meituan‑Dianping Scales Real‑Time Monitoring for Trillions of Events with CAT
Efficient Ops
Efficient Ops
Jul 28, 2019 · Operations

How 58’s Intelligent Monitoring System Guarantees 24/7 Service Stability

This article details the design, architecture, and AI‑driven features of 58’s intelligent monitoring platform, explaining how multi‑dimensional data collection, predictive analytics, and smart alarm merging ensure continuous, automated observability across network, server, application, and business layers.

Monitoringanomaly detectioncloud infrastructure
0 likes · 20 min read
How 58’s Intelligent Monitoring System Guarantees 24/7 Service Stability
Efficient Ops
Efficient Ops
Jul 9, 2019 · Operations

How SF Express Scaled Operations: Lessons from Digital Transformation

In this talk, SF Express’s tech leader shares how the company digitized its logistics, unified goals across teams, streamlined processes, built resilient infrastructure, and leveraged monitoring and gray‑release strategies to sustain explosive growth while reducing costs and improving service quality.

Digital TransformationMonitoringOperations
0 likes · 14 min read
How SF Express Scaled Operations: Lessons from Digital Transformation
Efficient Ops
Efficient Ops
Jun 27, 2019 · Operations

Scaling Monitoring to Millions of Metrics with Open‑Source and AIOps

This talk shares China Mobile Online Service’s journey of building a nationwide, software‑defined monitoring platform, detailing the shift from legacy PBX systems to open‑source tools, the challenges of scaling to millions of metrics, and how AI‑driven AIOps is used to automate, compress, and intelligently alert on massive operational data.

AIOpsBig DataMonitoring
0 likes · 15 min read
Scaling Monitoring to Millions of Metrics with Open‑Source and AIOps
Efficient Ops
Efficient Ops
Jun 13, 2019 · Operations

How to Build a Future‑Proof Operations Platform with End‑State Architecture

This article explains the challenges of modern large‑scale operations, introduces the end‑state architectural principle, details the system components and safety model, discusses real‑world deployment issues, and looks ahead to future AIOps possibilities, offering practical guidance for building resilient operation platforms.

MonitoringOperationsdeployment
0 likes · 24 min read
How to Build a Future‑Proof Operations Platform with End‑State Architecture