Monitoring | BestHub

Collection size

1711 articles

Page 78 of 86

Efficient Ops

Oct 12, 2022 · Backend Development

From Monolith to Microservices: A Real‑World Journey and Lessons Learned

This article walks through the evolution of a simple online supermarket from a monolithic website to a fully split microservice architecture, highlighting the challenges encountered—such as code duplication, database bottlenecks, and operational complexity—and presenting practical solutions like service decomposition, monitoring, tracing, gateway control, service discovery, circuit breaking, rate limiting, testing strategies, and the use of service meshes.

ArchitectureMicroservicesMonitoring

0 likes · 23 min read

From Monolith to Microservices: A Real‑World Journey and Lessons Learned

Efficient Ops

Jun 22, 2022 · Operations

Top 13 Essential Linux Ops Tools Every Sysadmin Should Master

This guide introduces thirteen practical Linux operations tools—from network bandwidth monitors like Nethogs to security scanners such as NMap—providing concise descriptions, installation commands, and usage tips to help system administrators efficiently manage and secure their servers.

LinuxMonitoringSecurity

0 likes · 12 min read

Top 13 Essential Linux Ops Tools Every Sysadmin Should Master

Efficient Ops

Mar 1, 2022 · Operations

Master Linux Performance: Key Metrics, Tools, and Optimization Techniques

This guide explains Linux performance optimization by defining core metrics such as throughput, latency, and average load, describing how to select and benchmark indicators, outlining essential analysis tools like vmstat, pidstat, and perf, and providing practical CPU and memory tuning strategies to eliminate bottlenecks.

CPULinuxMemory

0 likes · 47 min read

Master Linux Performance: Key Metrics, Tools, and Optimization Techniques

Efficient Ops

Feb 22, 2022 · Operations

Tackling Cloud‑Native Ops Challenges: Real‑World Practices from NetEase

NetEase’s cloud‑native operations team shares how they confront new challenges of Kubernetes adoption—ranging from technical stack shifts and knowledge‑base gaps to capacity planning, automated diagnostics, monitoring, alerting, and cost‑saving strategies—offering practical insights for building efficient, stable, and scalable ops systems.

AutomationKubernetesMonitoring

0 likes · 22 min read

Tackling Cloud‑Native Ops Challenges: Real‑World Practices from NetEase

Efficient Ops

Jan 25, 2022 · Operations

From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform

Over two years, we built a monitoring system covering 200+ services and 700+ instances, evolving from ad‑hoc Nginx logs to a Prometheus‑based observability platform with unified dashboards, automated alerts, and lessons on metric selection, alert fatigue, and fault isolation.

AlertingGrafanaMonitoring

0 likes · 9 min read

From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform

Efficient Ops

Feb 7, 2022 · Operations

Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips

This article explains how to design effective Prometheus metrics for various application types, choose appropriate vectors, labels, and buckets, and offers Grafana tricks for visualizing dimensions and linking tooltips, providing a comprehensive guide for robust observability.

Best PracticesGrafanaMonitoring

0 likes · 10 min read

Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips

Efficient Ops

Jan 23, 2022 · Operations

How to Monitor Nginx Logs with ELK: From Logstash Setup to Kibana Dashboard

This step‑by‑step guide shows how to collect, parse, and visualize Nginx access logs using the ELK stack, configure Logstash pipelines, set up Elasticsearch indices, proxy Kibana through Nginx, and secure access with HTTP basic authentication.

ELKKibanaLog Analysis

0 likes · 13 min read

How to Monitor Nginx Logs with ELK: From Logstash Setup to Kibana Dashboard

Efficient Ops

Jan 20, 2022 · Operations

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical guidelines for designing Prometheus metrics, covering how to define monitoring targets, choose appropriate vectors and labels, name metrics and labels correctly, select histogram buckets, and leverage Grafana features to visualize and troubleshoot data effectively.

GrafanaMonitoringObservability

0 likes · 11 min read

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

Efficient Ops

Jan 12, 2022 · Cloud Native

Why Kubernetes Monitoring Differs from VM Metrics: CPU, Memory, Disk, Network

This article compares Kubernetes pod monitoring metrics with traditional KVM/VM metrics across CPU, memory, disk, and network, explaining the underlying reasons for the differences and offering guidance on interpreting the data for effective performance troubleshooting.

CPUKubernetesMemory

0 likes · 11 min read

Why Kubernetes Monitoring Differs from VM Metrics: CPU, Memory, Disk, Network

Efficient Ops

Dec 6, 2021 · Operations

How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023

The article summarizes a GOPS conference presentation by Dingmao Technology on AIOps scenario‑driven construction, detailing challenges, definition of scenarios, technical methods, roadmap planning, and future prospects, while showcasing practical examples and supporting technologies for intelligent IT operations.

AIOpsIT OperationsMonitoring

0 likes · 8 min read

How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023

Efficient Ops

Oct 18, 2021 · Operations

Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Cloud Environments?

This article compares Prometheus and Zabbix, detailing their histories, architectures, data storage models, deployment complexity, community activity, and suitability for containerized versus traditional environments, helping readers decide which monitoring solution best fits their infrastructure needs.

MonitoringObservabilityPrometheus

0 likes · 8 min read

Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Cloud Environments?

Efficient Ops

Sep 26, 2021 · Cloud Native

How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs

This article analyzes why our Kubernetes clusters were constantly unstable—citing an erratic release process, missing monitoring, logging, documentation, and unclear request routing—and presents a comprehensive solution that includes a Kubernetes‑centric CI/CD pipeline, federated monitoring, centralized logging, a documentation hub, and integrated traffic management.

CI/CDDevOpsKubernetes

0 likes · 8 min read

How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs

Efficient Ops

Sep 14, 2021 · Cloud Native

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

This comprehensive guide walks beginners through Kubernetes fundamentals, core components, key objects, storage, networking, resource management, security, cluster operations, backup, logging, monitoring, DevOps practices, and deep‑dive techniques, providing a clear learning path and practical tips for effective use.

BackupDevOpsKubernetes

0 likes · 16 min read

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

Efficient Ops

Aug 17, 2021 · Operations

How to Build an Effective Monitoring System for Reliable Operations

This article outlines the goals, methods, core steps, tools, metrics, and alert handling strategies essential for designing a comprehensive monitoring system that ensures system reliability and continuous business operation.

AlertingMonitoringObservability

0 likes · 8 min read

How to Build an Effective Monitoring System for Reliable Operations

Efficient Ops

Aug 10, 2021 · Operations

From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform

AlertingMonitoringObservability

0 likes · 9 min read

Efficient Ops

Jul 5, 2021 · Operations

10 Essential Practices to Prevent DBA and Ops Disasters

Learn ten practical strategies—from safe change rollbacks and cautious destructive commands to robust backups, clear prompts, vigilant monitoring, and disciplined handovers—that help DBAs and operations engineers avoid costly system failures and maintain reliable production environments.

BackupLinuxMonitoring

0 likes · 6 min read

10 Essential Practices to Prevent DBA and Ops Disasters

Efficient Ops

May 12, 2021 · Operations

7 Ready‑to‑Use Python & Shell Scripts to Supercharge Your Ops

This article shares a curated collection of ready‑to‑run Python and Shell scripts—including Enterprise WeChat alerts, FTP and SSH clients, SaltStack and vCenter utilities, SSL certificate checks, weather notifications, SVN backups, Zabbix password monitoring, local YUM mirroring, and high‑load detection—complete with full source code and usage notes to help engineers automate routine tasks and boost operational efficiency.

AutomationMonitoringPython

0 likes · 30 min read

7 Ready‑to‑Use Python & Shell Scripts to Supercharge Your Ops

Efficient Ops

Mar 1, 2021 · Operations

Mastering Monitoring: From Fundamentals to Prometheus in Cloud‑Native Environments

This comprehensive guide explains the purpose, models, and methods of monitoring across the entire software lifecycle, compares health checks, logging, tracing, and metric collection, and details practical implementations using tools like ELK, SkyWalking, and Prometheus for cloud‑native operations.

MonitoringPrometheuscloud native

0 likes · 24 min read

Mastering Monitoring: From Fundamentals to Prometheus in Cloud‑Native Environments

Efficient Ops

Feb 22, 2021 · Operations

Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained

Prometheus alerts may not fire even when metrics exceed thresholds due to the ‘for’ pending duration, sparse sampling, and Grafana’s range queries, and this article explains the underlying mechanisms, illustrates common pitfalls with diagrams, and offers practical strategies to diagnose and resolve missing or unexpected alerts.

AlertingGrafanaMonitoring

0 likes · 6 min read

Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained

Efficient Ops

Dec 7, 2020 · Operations

How to Diagnose and Resolve Common Java Server Performance Issues

This guide walks through systematic troubleshooting of Java server problems—including CPU spikes, memory leaks, disk bottlenecks, GC pauses, and network anomalies—by using tools such as jstack, jmap, jstat, vmstat, iostat, netstat, and ss to pinpoint root causes and apply targeted fixes.

JavaMonitoringTroubleshooting

0 likes · 22 min read

How to Diagnose and Resolve Common Java Server Performance Issues