Topic

monitoring

Collection size
1642 articles
Page 2 of 83
DevOps Operations Practice
DevOps Operations Practice
Apr 11, 2025 · Operations

Promtool: A Complete Guide to Configuration Validation, Rule Checking, TSDB Management, and Debugging for Prometheus

This article introduces Promtool, the multifunctional command‑line utility bundled with Prometheus, and explains how to validate configurations, check and test rules, query metrics, manage the TSDB, run unit tests, use debugging helpers, install the tool, and apply best‑practice recommendations.

DebuggingPrometheusPromtool
0 likes · 5 min read
Promtool: A Complete Guide to Configuration Validation, Rule Checking, TSDB Management, and Debugging for Prometheus
DevOps Operations Practice
DevOps Operations Practice
Aug 11, 2024 · Operations

Monitoring Multi-Region HTTP Requests with Prometheus and Blackbox Exporter

This article explains how to deploy Blackbox Exporter in multiple data centers, configure Prometheus to scrape region‑specific HTTP metrics for a target website, validate the setup via queries, and add alerting rules to detect latency or downtime, providing a self‑hosted monitoring solution.

AlertingBlackbox ExporterDocker
0 likes · 5 min read
Monitoring Multi-Region HTTP Requests with Prometheus and Blackbox Exporter
DevOps Operations Practice
DevOps Operations Practice
Mar 14, 2024 · Operations

Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions

This article analyzes why a single Prometheus instance repeatedly runs out of memory and crashes, explains the underlying storage mechanisms, and presents practical solutions such as metric reduction, retention tuning, federation architecture, and remote storage integration to improve stability and scalability.

FederationPrometheusmonitoring
0 likes · 6 min read
Resolving Frequent Crashes of a Single-Node Prometheus Deployment: Analysis and Solutions
Practical DevOps Architecture
Practical DevOps Architecture
May 9, 2024 · Operations

Monitoring SSL Certificate Expiration with Zabbix Using a Shell Script

This guide explains how to create a shell script that checks SSL certificate expiration dates and integrates it with Zabbix by configuring a user parameter, testing the script, and setting up monitoring items, triggers, graphs, and alerts to ensure services remain available.

SSLZabbixautomation
0 likes · 3 min read
Monitoring SSL Certificate Expiration with Zabbix Using a Shell Script
FunTester
FunTester
Jun 5, 2025 · Cloud Native

Automating Thread Dump Generation and Retrieval in Kubernetes for Efficient Fault Diagnosis

The article explains how automating thread dump creation and download in Kubernetes using tools like Fabric8, Prometheus, and CI/CD pipelines dramatically improves fault‑diagnosis speed, data centralization, real‑time capture, and integration with testing frameworks, transforming manual, error‑prone processes into streamlined, intelligent operations.

CI/CDKubernetesThread Dump
0 likes · 6 min read
Automating Thread Dump Generation and Retrieval in Kubernetes for Efficient Fault Diagnosis
Efficient Ops
Efficient Ops
Dec 11, 2024 · Operations

Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?

This article compares Thanos and VictoriaMetrics as long‑term storage solutions for Prometheus, evaluating their architecture, write and read paths, reliability, consistency, performance, scalability, high‑availability, and hosting costs to help you choose the most suitable option for your monitoring stack.

Cost comparisonPrometheusThanos
0 likes · 18 min read
Thanos vs VictoriaMetrics: Which Prometheus Storage Solution Wins for Scale and Cost?
Efficient Ops
Efficient Ops
Aug 17, 2022 · Operations

Master System Monitoring with the USE Method and Prometheus

This article explains how to build a comprehensive monitoring system using the concise USE (Utilization, Saturation, Errors) method, outlines key system and application metrics, and demonstrates practical implementation with Prometheus, Grafana, full‑link tracing, and ELK for observability and performance troubleshooting.

PrometheusUSE methodfull-link tracing
0 likes · 13 min read
Master System Monitoring with the USE Method and Prometheus
macrozheng
macrozheng
Jun 9, 2025 · Backend Development

Mastering Redis Hotspot Keys: Detection, Risks, and Solutions

This article explains what Redis hotspot keys are, the performance and stability issues they cause, common causes, how to monitor and identify them, and practical mitigation strategies such as cluster scaling, key sharding, and multi‑level caching.

BackendRediscaching
0 likes · 10 min read
Mastering Redis Hotspot Keys: Detection, Risks, and Solutions
Linux Ops Smart Journey
Linux Ops Smart Journey
Jun 13, 2025 · Operations

Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes

This article dives deep into ServiceMonitor, comparing it with traditional Prometheus configurations, detailing its core fields, and providing hands‑on examples for Harbor and GitLab metrics, enabling you to create stable, flexible, and maintainable monitoring setups for Kubernetes services.

Cloud NativeKubernetesPrometheus
0 likes · 5 min read
Master ServiceMonitor: Build Reliable Prometheus Monitoring for Kubernetes
DevOps Operations Practice
DevOps Operations Practice
Jun 16, 2025 · Cloud Native

Mastering Kubernetes: 6 Essential Tools for Cluster Management

This article introduces six indispensable tools—kubectl, Helm, Prometheus + Grafana, Istio, Velero, and K9s—that simplify Kubernetes cluster management by covering resource handling, monitoring, networking, security, backup, and interactive UI, helping readers efficiently operate production‑grade clusters.

Cloud NativeDevOpsKubernetes
0 likes · 7 min read
Mastering Kubernetes: 6 Essential Tools for Cluster Management
DeWu Technology
DeWu Technology
Apr 21, 2025 · Backend Development

Design and Evolution of a Unified Exchange Mall Middleware Platform

The unified exchange mall middleware platform consolidates disparate points‑redemption and lottery flows into a four‑layer architecture—business, gameplay templates, domain models, and downstream services—offering standardized APIs, dynamic RPC routing, Redis‑based inventory control, anti‑fraud safeguards, and built‑in monitoring, thereby cutting development costs, enhancing maintainability, and ensuring system stability.

BackendMicroservicesanti-fraud
0 likes · 18 min read
Design and Evolution of a Unified Exchange Mall Middleware Platform
DeWu Technology
DeWu Technology
Nov 25, 2024 · Databases

Redis Hot Key Detection and Kernel-Based Real-Time Statistics

The article describes a kernel‑level hot‑key detection module for Redis that tracks per‑second access counts via an O(1) LRU queue, flags keys exceeding configurable thresholds, and provides real‑time subscription alerts and queryable logs, overcoming the latency and overhead limitations of existing detection methods.

BackendHotKeyKernel
0 likes · 11 min read
Redis Hot Key Detection and Kernel-Based Real-Time Statistics
DeWu Technology
DeWu Technology
Oct 23, 2024 · Backend Development

Automated Traffic Rule Inspection with Flow Replay Platform

The Flow Replay Platform automates traffic‑rule inspection by recording traffic from all environments, letting engineers define jsonPath‑based interface rules that continuously validate pre‑release and production traffic, instantly alerting anomalies, reducing false positives, accelerating release verification, and cutting manual testing effort, as demonstrated by discovered coupon‑related bugs.

Backendautomated testingmonitoring
0 likes · 9 min read
Automated Traffic Rule Inspection with Flow Replay Platform
DeWu Technology
DeWu Technology
Feb 27, 2023 · Operations

Message Push Monitoring and SLA Practices

The team implemented SLA‑based, node‑level monitoring for mobile push messages—splitting the workflow, measuring latency, blocking volume, and success rates, isolating metrics with Spring AOP, and tracking third‑party vendors—resulting in clear latency standards, doubled peak throughput, faster issue resolution, and improved overall reliability.

BackendSLAmessage-push
0 likes · 11 min read
Message Push Monitoring and SLA Practices
DeWu Technology
DeWu Technology
May 16, 2022 · Operations

NOC SLA Implementation for Consumer Trading Platform

To tackle growing production complexity and past incident delays, the consumer trading platform introduced a three‑tier NOC‑SLA with intelligent baselines powered by Facebook Prophet, streamlined alert rules, and an SOS‑linked workflow, boosting detection frequency, cutting critical response times to under five minutes, and improving overall system reliability while emphasizing ongoing baseline and rule maintenance.

NOCSLAalert management
0 likes · 13 min read
NOC SLA Implementation for Consumer Trading Platform
DeWu Technology
DeWu Technology
Jan 10, 2022 · Mobile Development

APK Package Size Optimization and Monitoring Platform for Android Apps

The platform optimizes Android APK size by shrinking resources, compressing images, filtering native libraries, and applying selective byte‑code reductions, then continuously monitors package composition, trends, and business‑line usage, alerting teams to abnormal growth while preserving functionality for large‑scale apps.

APKAndroidDex
0 likes · 13 min read
APK Package Size Optimization and Monitoring Platform for Android Apps
Java Tech Enthusiast
Java Tech Enthusiast
Jul 21, 2024 · Backend Development

Interface Performance Optimization Techniques for Backend Development

The article outlines practical backend interface performance optimizations—including proper indexing, SQL tuning, parallel remote calls, batch queries, asynchronous processing, scoped transactions, fine-grained locking, pagination batching, multi-level caching, sharding, and monitoring tools—to dramatically reduce latency and improve throughput.

BackendIndexingSQL optimization
0 likes · 25 min read
Interface Performance Optimization Techniques for Backend Development
Java Tech Enthusiast
Java Tech Enthusiast
May 5, 2024 · Information Security

Preventing Malicious API Abuse: Security Measures and Best Practices

To prevent malicious API abuse, implement layered defenses such as firewalls to block unwanted traffic, robust captchas and SMS verification, mandatory authentication with permission controls, IP whitelisting for critical endpoints, HTTPS encryption, strict rate‑limiting via Redis, continuous monitoring with alerts, and an API gateway that centralizes filtering, authentication and throttling.

API securityIP whitelistcaptcha
0 likes · 9 min read
Preventing Malicious API Abuse: Security Measures and Best Practices
DaTaobao Tech
DaTaobao Tech
Sep 18, 2023 · Databases

Comprehensive Approach to Slow SQL Detection and Governance

The Taobao platform’s slow‑SQL governance team implemented a comprehensive detection and governance pipeline—combining internal slow‑log tools, database slow‑query logs, and JVM‑Sandbox instrumentation to capture full SQL details, scoring high‑risk queries by execution time, scans, and standards violations, then prioritizing remediation through health scores, branch‑diff checks, and issue tracking—significantly cutting DB‑related incidents and boosting system stability.

JVM sandboxSQLdatabase
0 likes · 12 min read
Comprehensive Approach to Slow SQL Detection and Governance