monitoring | BestHub

Collection size

1642 articles

Page 2 of 83

DevOps Operations Practice

Apr 11, 2025 · Operations

Promtool: A Complete Guide to Configuration Validation, Rule Checking, TSDB Management, and Debugging for Prometheus

This article introduces Promtool, the multifunctional command‑line utility bundled with Prometheus, and explains how to validate configurations, check and test rules, query metrics, manage the TSDB, run unit tests, use debugging helpers, install the tool, and apply best‑practice recommendations.

DebuggingPrometheusPromtool

0 likes · 5 min read

Promtool: A Complete Guide to Configuration Validation, Rule Checking, TSDB Management, and Debugging for Prometheus

DevOps Operations Practice

Aug 11, 2024 · Operations

Monitoring Multi-Region HTTP Requests with Prometheus and Blackbox Exporter

This article explains how to deploy Blackbox Exporter in multiple data centers, configure Prometheus to scrape region‑specific HTTP metrics for a target website, validate the setup via queries, and add alerting rules to detect latency or downtime, providing a self‑hosted monitoring solution.

AlertingBlackbox ExporterDocker

0 likes · 5 min read

Monitoring Multi-Region HTTP Requests with Prometheus and Blackbox Exporter

DevOps Operations Practice

Mar 14, 2024 · Operations

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

This article analyzes why a single Prometheus instance repeatedly runs out of memory and crashes, explains the underlying storage mechanisms, and presents practical solutions such as metric reduction, retention tuning, federation architecture, and remote storage integration to improve stability and scalability.

FederationPrometheusmonitoring

0 likes · 6 min read

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

Practical DevOps Architecture

May 9, 2024 · Operations

Monitoring SSL Certificate Expiration with Zabbix Using a Shell Script

This guide explains how to create a shell script that checks SSL certificate expiration dates and integrates it with Zabbix by configuring a user parameter, testing the script, and setting up monitoring items, triggers, graphs, and alerts to ensure services remain available.

SSLZabbixautomation

0 likes · 3 min read

Monitoring SSL Certificate Expiration with Zabbix Using a Shell Script

DevOps Cloud Academy

Sep 27, 2019 · Cloud Native

Configuring Prometheus Operator ServiceMonitor on OpenShift after Migrating from Mesos+Marathon

This article explains how to migrate a Mesos+Marathon environment to OpenShift and configure Prometheus Operator ServiceMonitor resources, including service creation, ServiceMonitor definition, and verification steps, with full YAML examples and screenshots of the monitoring UI.

Cloud NativeKubernetesOpenShift

0 likes · 6 min read

Configuring Prometheus Operator ServiceMonitor on OpenShift after Migrating from Mesos+Marathon

FunTester

Jun 5, 2025 · Cloud Native

Automating Thread Dump Generation and Retrieval in Kubernetes for Efficient Fault Diagnosis

The article explains how automating thread dump creation and download in Kubernetes using tools like Fabric8, Prometheus, and CI/CD pipelines dramatically improves fault‑diagnosis speed, data centralization, real‑time capture, and integration with testing frameworks, transforming manual, error‑prone processes into streamlined, intelligent operations.

CI/CDKubernetesThread Dump

0 likes · 6 min read

Automating Thread Dump Generation and Retrieval in Kubernetes for Efficient Fault Diagnosis

Efficient Ops

Dec 11, 2024 · Operations

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

This article compares Thanos and VictoriaMetrics as long‑term storage solutions for Prometheus, evaluating their architecture, write and read paths, reliability, consistency, performance, scalability, high‑availability, and hosting costs to help you choose the most suitable option for your monitoring stack.

Cost comparisonPrometheusThanos

0 likes · 18 min read

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

Efficient Ops

Aug 17, 2022 · Operations

Master System Monitoring with the USE Method and Prometheus

This article explains how to build a comprehensive monitoring system using the concise USE (Utilization, Saturation, Errors) method, outlines key system and application metrics, and demonstrates practical implementation with Prometheus, Grafana, full‑link tracing, and ELK for observability and performance troubleshooting.

PrometheusUSE methodfull-link tracing

0 likes · 13 min read

Master System Monitoring with the USE Method and Prometheus

macrozheng

Jun 9, 2025 · Backend Development

Mastering Redis Hotspot Keys: Detection, Risks, and Solutions

This article explains what Redis hotspot keys are, the performance and stability issues they cause, common causes, how to monitor and identify them, and practical mitigation strategies such as cluster scaling, key sharding, and multi‑level caching.

BackendRediscaching

0 likes · 10 min read

Mastering Redis Hotspot Keys: Detection, Risks, and Solutions

Linux Ops Smart Journey

Jun 13, 2025 · Operations

Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes

This article dives deep into ServiceMonitor, comparing it with traditional Prometheus configurations, detailing its core fields, and providing hands‑on examples for Harbor and GitLab metrics, enabling you to create stable, flexible, and maintainable monitoring setups for Kubernetes services.

Cloud NativeKubernetesPrometheus

0 likes · 5 min read

Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes

DevOps Operations Practice

Jun 16, 2025 · Cloud Native

Mastering Kubernetes: 6 Essential Tools for Cluster Management

This article introduces six indispensable tools—kubectl, Helm, Prometheus + Grafana, Istio, Velero, and K9s—that simplify Kubernetes cluster management by covering resource handling, monitoring, networking, security, backup, and interactive UI, helping readers efficiently operate production‑grade clusters.

Cloud NativeDevOpsKubernetes

0 likes · 7 min read

Mastering Kubernetes: 6 Essential Tools for Cluster Management

DeWu Technology

Apr 21, 2025 · Backend Development

Design and Evolution of a Unified Exchange Mall Middleware Platform

The unified exchange mall middleware platform consolidates disparate points‑redemption and lottery flows into a four‑layer architecture—business, gameplay templates, domain models, and downstream services—offering standardized APIs, dynamic RPC routing, Redis‑based inventory control, anti‑fraud safeguards, and built‑in monitoring, thereby cutting development costs, enhancing maintainability, and ensuring system stability.

BackendMicroservicesanti-fraud

0 likes · 18 min read

Design and Evolution of a Unified Exchange Mall Middleware Platform

DeWu Technology

Nov 25, 2024 · Databases

Redis Hot Key Detection and Kernel-Based Real-Time Statistics

The article describes a kernel‑level hot‑key detection module for Redis that tracks per‑second access counts via an O(1) LRU queue, flags keys exceeding configurable thresholds, and provides real‑time subscription alerts and queryable logs, overcoming the latency and overhead limitations of existing detection methods.

BackendHotKeyKernel

0 likes · 11 min read

Redis Hot Key Detection and Kernel-Based Real-Time Statistics

DeWu Technology

Oct 23, 2024 · Backend Development

Automated Traffic Rule Inspection with Flow Replay Platform

The Flow Replay Platform automates traffic‑rule inspection by recording traffic from all environments, letting engineers define jsonPath‑based interface rules that continuously validate pre‑release and production traffic, instantly alerting anomalies, reducing false positives, accelerating release verification, and cutting manual testing effort, as demonstrated by discovered coupon‑related bugs.

Backendautomated testingmonitoring

0 likes · 9 min read

Automated Traffic Rule Inspection with Flow Replay Platform

DeWu Technology

Feb 27, 2023 · Operations

Message Push Monitoring and SLA Practices

The team implemented SLA‑based, node‑level monitoring for mobile push messages—splitting the workflow, measuring latency, blocking volume, and success rates, isolating metrics with Spring AOP, and tracking third‑party vendors—resulting in clear latency standards, doubled peak throughput, faster issue resolution, and improved overall reliability.

BackendSLAmessage-push

0 likes · 11 min read

Message Push Monitoring and SLA Practices

DeWu Technology

May 16, 2022 · Operations

NOC SLA Implementation for Consumer Trading Platform

To tackle growing production complexity and past incident delays, the consumer trading platform introduced a three‑tier NOC‑SLA with intelligent baselines powered by Facebook Prophet, streamlined alert rules, and an SOS‑linked workflow, boosting detection frequency, cutting critical response times to under five minutes, and improving overall system reliability while emphasizing ongoing baseline and rule maintenance.

NOCSLAalert management

0 likes · 13 min read

NOC SLA Implementation for Consumer Trading Platform

DeWu Technology

Jan 10, 2022 · Mobile Development

APK Package Size Optimization and Monitoring Platform for Android Apps

The platform optimizes Android APK size by shrinking resources, compressing images, filtering native libraries, and applying selective byte‑code reductions, then continuously monitors package composition, trends, and business‑line usage, alerting teams to abnormal growth while preserving functionality for large‑scale apps.

APKAndroidDex

0 likes · 13 min read

APK Package Size Optimization and Monitoring Platform for Android Apps

Java Tech Enthusiast

Jul 21, 2024 · Backend Development

Interface Performance Optimization Techniques for Backend Development

The article outlines practical backend interface performance optimizations—including proper indexing, SQL tuning, parallel remote calls, batch queries, asynchronous processing, scoped transactions, fine-grained locking, pagination batching, multi-level caching, sharding, and monitoring tools—to dramatically reduce latency and improve throughput.

BackendIndexingSQL optimization

0 likes · 25 min read

Interface Performance Optimization Techniques for Backend Development

Java Tech Enthusiast

May 5, 2024 · Information Security

Preventing Malicious API Abuse: Security Measures and Best Practices

To prevent malicious API abuse, implement layered defenses such as firewalls to block unwanted traffic, robust captchas and SMS verification, mandatory authentication with permission controls, IP whitelisting for critical endpoints, HTTPS encryption, strict rate‑limiting via Redis, continuous monitoring with alerts, and an API gateway that centralizes filtering, authentication and throttling.

API securityIP whitelistcaptcha

0 likes · 9 min read

Preventing Malicious API Abuse: Security Measures and Best Practices

DaTaobao Tech

Sep 18, 2023 · Databases

Comprehensive Approach to Slow SQL Detection and Governance

The Taobao platform’s slow‑SQL governance team implemented a comprehensive detection and governance pipeline—combining internal slow‑log tools, database slow‑query logs, and JVM‑Sandbox instrumentation to capture full SQL details, scoring high‑risk queries by execution time, scans, and standards violations, then prioritizing remediation through health scores, branch‑diff checks, and issue tracking—significantly cutting DB‑related incidents and boosting system stability.

JVM sandboxSQLdatabase

0 likes · 12 min read

Comprehensive Approach to Slow SQL Detection and Governance