Monitoring | BestHub

Collection size

1711 articles

Page 8 of 86

Efficient Ops

Aug 18, 2024 · Operations

Essential Bash Scripts for Linux Ops: Monitoring, Deployment & Automation

A curated collection of ready‑to‑use Bash scripts that help you monitor MySQL replication, track directory changes, batch‑create users, detect website issues, execute remote commands, deploy LNMP stacks, check server resource usage, identify high‑load processes, and automate Java or PHP project releases.

DevOpsLinuxMonitoring

0 likes · 12 min read

Essential Bash Scripts for Linux Ops: Monitoring, Deployment & Automation

Efficient Ops

Jul 28, 2024 · Operations

Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops

This guide outlines a comprehensive, step‑by‑step strategy for creating a highly available, secure, and scalable website—from buying and protecting multiple domains, configuring DNS and CDN, setting up image and database servers, to implementing monitoring, redundancy, high‑concurrency testing, and disaster‑recovery plans.

CDNMonitoringOperations

0 likes · 13 min read

Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops

Efficient Ops

Mar 18, 2024 · Operations

How to Implement Fault Self‑Healing for Scalable Operations

This article explains why low‑disk alerts demand automation, outlines the concept of fault self‑healing versus manual response, and provides practical guidelines—including standards, monitoring dimensions, CMDB integration, script execution tools, and notification channels—to build a reliable self‑healing system for large‑scale environments.

CMDBDevOpsMonitoring

0 likes · 10 min read

How to Implement Fault Self‑Healing for Scalable Operations

Efficient Ops

Mar 13, 2024 · Operations

What Does an Operations Engineer Do? Skills, Tools, and Career Path

This article explains the role of an operations (运维) engineer, covering daily responsibilities, essential knowledge such as Linux and networking, common monitoring tools, and emerging career paths like DevOps, AIOps, and SRE, helping newcomers understand how to start and grow in the field.

DevOpsLinuxMonitoring

0 likes · 6 min read

What Does an Operations Engineer Do? Skills, Tools, and Career Path

Efficient Ops

Sep 26, 2023 · Operations

Mastering Zabbix: From Installation to Advanced Monitoring and Automation

This comprehensive guide walks you through Zabbix monitoring concepts, reliability calculations, installation methods, web UI configuration, host and template management, custom monitoring, alert integration with OneAlert, Grafana visualization, distributed monitoring, SNMP support, and practical scripts for large‑scale server environments.

AlertingGrafanaMonitoring

0 likes · 28 min read

Mastering Zabbix: From Installation to Advanced Monitoring and Automation

Efficient Ops

Aug 31, 2022 · Operations

How to Build Scalable Fault Self‑Healing for Modern Operations

This article explains why traditional manual responses to alerts are insufficient, outlines the concept of fault self‑healing, and provides a step‑by‑step guide on establishing standards, monitoring dimensions, a unified CMDB, automation tools, and notification channels to achieve automated recovery at scale.

CMDBMonitoringOperations

0 likes · 9 min read

How to Build Scalable Fault Self‑Healing for Modern Operations

Efficient Ops

Aug 17, 2022 · Operations

Master System Monitoring with the USE Method and Prometheus

This article explains how to build a comprehensive monitoring system using the concise USE (Utilization, Saturation, Errors) method, outlines key system and application metrics, and demonstrates practical implementation with Prometheus, Grafana, full‑link tracing, and ELK for observability and performance troubleshooting.

MonitoringPrometheusUSE method

0 likes · 13 min read

Master System Monitoring with the USE Method and Prometheus

Efficient Ops

Nov 24, 2021 · Operations

Why Switch to Loki? Step‑by‑Step Installation and Grafana Visualization

This guide explains why Loki is a lightweight alternative to EFK/ELK, walks through installing Loki and Promtail binaries, configuring them with YAML files, and visualizing logs in Grafana using LogQL, providing a complete end‑to‑end log management solution.

GrafanaLokiMonitoring

0 likes · 6 min read

Why Switch to Loki? Step‑by‑Step Installation and Grafana Visualization

Efficient Ops

Apr 20, 2021 · Operations

How Dada’s Intelligent Elastic Scaling Cuts Costs and Boosts Delivery Performance

This article details Dada Group’s implementation of an intelligent elastic scaling architecture that automatically adjusts capacity during peak promotions and low‑traffic periods, improving delivery reliability, reducing costs, and supporting multi‑cloud and multi‑runtime environments through sophisticated monitoring and auto‑scaler mechanisms.

Cloud NativeMonitoringOperations

0 likes · 17 min read

How Dada’s Intelligent Elastic Scaling Cuts Costs and Boosts Delivery Performance

Efficient Ops

Oct 16, 2018 · Operations

How Tencent Built an AI‑Powered Network Fault Detection System in Minutes

In this talk, Tencent’s infrastructure lead explains how their team created an AI‑driven, three‑minute fault detection and recovery pipeline—combining high‑precision Meshping monitoring, multi‑KPI analytics, and automated Moveout isolation—to dramatically shorten network outage resolution from hours to minutes.

AIOpsMonitoringNetwork Operations

0 likes · 18 min read

How Tencent Built an AI‑Powered Network Fault Detection System in Minutes

Efficient Ops

May 21, 2018 · Databases

Designing Scalable MySQL Cloud DBaaS: Architecture, Availability, and Future Plans

This article summarizes the design and evolution of a MySQL cloud DBaaS platform, covering MySQL 8.0 features, the need for DBaaS, multi‑generation architecture, service and data availability strategies, monitoring, DTS design, and upcoming roadmap for broader database support and hybrid cloud deployment.

DBaaSDTSDatabase Design

0 likes · 13 min read

Designing Scalable MySQL Cloud DBaaS: Architecture, Availability, and Future Plans

Efficient Ops

Dec 5, 2017 · Operations

How Alibaba’s Sunfire Achieves Second‑Level Monitoring at Trillion‑Transaction Scale

This article explains how Alibaba’s Sunfire monitoring platform processes terabytes of logs per minute, uses a pull‑based architecture with Brain‑Reduce‑Map roles, tackles scalability and reliability challenges, and outlines future directions such as MQL standardization and intelligent baselines.

MonitoringOperationsReal-time

0 likes · 17 min read

How Alibaba’s Sunfire Achieves Second‑Level Monitoring at Trillion‑Transaction Scale

Linux Ops Smart Journey

Jun 6, 2025 · Operations

How to Build a Complete Longhorn Monitoring System with Prometheus & Grafana

This guide explains how to monitor Longhorn storage in Kubernetes by collecting metrics with Prometheus, configuring scraping, verifying data collection, and visualizing everything in Grafana, enabling proactive performance tuning and reliable operations.

Cloud NativeGrafanaKubernetes

0 likes · 6 min read

How to Build a Complete Longhorn Monitoring System with Prometheus & Grafana

Linux Ops Smart Journey

Apr 20, 2025 · Operations

Visualize Kubernetes Events: Store in Elasticsearch and Dashboard with Grafana

This guide explains how to store Kubernetes event data in Elasticsearch, configure Logstash and Ruby filters for timestamp correction, and create a Grafana dashboard to visualize and analyze cluster events for improved monitoring and troubleshooting.

ElasticsearchGrafanaK8s Events

0 likes · 4 min read

Visualize Kubernetes Events: Store in Elasticsearch and Dashboard with Grafana

Linux Ops Smart Journey

Apr 16, 2025 · Operations

How to Build a Robust Elasticsearch Monitoring System with Prometheus & Grafana

Learn step‑by‑step how to deploy the Elasticsearch‑exporter via Helm, configure Prometheus to scrape its metrics, and visualize them in Grafana, enabling comprehensive monitoring of Elasticsearch clusters for performance, health, and early issue detection in Kubernetes environments.

ElasticsearchExporterGrafana

0 likes · 7 min read

How to Build a Robust Elasticsearch Monitoring System with Prometheus & Grafana

Linux Ops Smart Journey

Jan 7, 2025 · Operations

Enable Nacos Metrics in Prometheus and Visualize with Grafana

This guide shows how to enable Nacos metrics, configure Prometheus to scrape them, and visualize the data with a Grafana dashboard, providing a centralized view across different departments for enterprise monitoring and decision‑making.

GrafanaKubernetesMonitoring

0 likes · 4 min read

Enable Nacos Metrics in Prometheus and Visualize with Grafana

macrozheng

Jun 9, 2025 · Backend Development

Mastering Redis Hotspot Keys: Detection, Risks, and Solutions

This article explains what Redis hotspot keys are, the performance and stability issues they cause, common causes, how to monitor and identify them, and practical mitigation strategies such as cluster scaling, key sharding, and multi‑level caching.

CachingMonitoringRedis

0 likes · 10 min read

Mastering Redis Hotspot Keys: Detection, Risks, and Solutions

macrozheng

Nov 8, 2022 · Operations

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, Prometheus

This article provides a systematic overview of monitoring fundamentals, compares three popular open‑source monitoring solutions—Zabbix, Open‑Falcon, and Prometheus—and offers practical guidance for selecting the most suitable system based on scale, features, and operational needs.

MonitoringOpen SourceOpen-Falcon

0 likes · 21 min read

Choosing the Right Open‑Source Monitoring System: Zabbix, Open‑Falcon, Prometheus

WeiLi Technology Team

Jun 28, 2024 · Big Data

How to Build a Robust Big Data Monitoring and Alerting System

This article explains why high‑availability design and comprehensive monitoring are essential for modern big‑data platforms, outlines a layered architecture, and provides practical guidance on health checks, alerting, and data‑quality monitoring across storage, compute, scheduling, and service layers.

AlertingBig DataFlink

0 likes · 14 min read

How to Build a Robust Big Data Monitoring and Alerting System

Xianyu Technology

Jun 17, 2020 · Backend Development

Lottery System Risk Management and SDK Integration

Xianyu mitigated lottery‑related financial loss by centralizing rights management, decoupling UI from business logic, and providing a unified SDK with simple draw APIs, while adding real‑time log backflow, comprehensive accounting and monitoring, cutting configuration time by over 50 % and eliminating UI‑only risk.

MonitoringSDKbackend

0 likes · 10 min read

Lottery System Risk Management and SDK Integration