Topic

Monitoring

Collection size
1711 articles
Page 78 of 86
Efficient Ops
Efficient Ops
Oct 12, 2022 · Backend Development

From Monolith to Microservices: A Real‑World Journey and Lessons Learned

This article walks through the evolution of a simple online supermarket from a monolithic website to a fully split microservice architecture, highlighting the challenges encountered—such as code duplication, database bottlenecks, and operational complexity—and presenting practical solutions like service decomposition, monitoring, tracing, gateway control, service discovery, circuit breaking, rate limiting, testing strategies, and the use of service meshes.

ArchitectureMicroservicesMonitoring
0 likes · 23 min read
From Monolith to Microservices: A Real‑World Journey and Lessons Learned
Efficient Ops
Efficient Ops
Jun 22, 2022 · Operations

Top 13 Essential Linux Ops Tools Every Sysadmin Should Master

This guide introduces thirteen practical Linux operations tools—from network bandwidth monitors like Nethogs to security scanners such as NMap—providing concise descriptions, installation commands, and usage tips to help system administrators efficiently manage and secure their servers.

LinuxMonitoringSecurity
0 likes · 12 min read
Top 13 Essential Linux Ops Tools Every Sysadmin Should Master
Efficient Ops
Efficient Ops
Mar 1, 2022 · Operations

Master Linux Performance: Key Metrics, Tools, and Optimization Techniques

This guide explains Linux performance optimization by defining core metrics such as throughput, latency, and average load, describing how to select and benchmark indicators, outlining essential analysis tools like vmstat, pidstat, and perf, and providing practical CPU and memory tuning strategies to eliminate bottlenecks.

CPULinuxMemory
0 likes · 47 min read
Master Linux Performance: Key Metrics, Tools, and Optimization Techniques
Efficient Ops
Efficient Ops
Feb 22, 2022 · Operations

Tackling Cloud‑Native Ops Challenges: Real‑World Practices from NetEase

NetEase’s cloud‑native operations team shares how they confront new challenges of Kubernetes adoption—ranging from technical stack shifts and knowledge‑base gaps to capacity planning, automated diagnostics, monitoring, alerting, and cost‑saving strategies—offering practical insights for building efficient, stable, and scalable ops systems.

AutomationKubernetesMonitoring
0 likes · 22 min read
Tackling Cloud‑Native Ops Challenges: Real‑World Practices from NetEase
Efficient Ops
Efficient Ops
Jan 25, 2022 · Operations

From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform

Over two years, we built a monitoring system covering 200+ services and 700+ instances, evolving from ad‑hoc Nginx logs to a Prometheus‑based observability platform with unified dashboards, automated alerts, and lessons on metric selection, alert fatigue, and fault isolation.

AlertingGrafanaMonitoring
0 likes · 9 min read
From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform
Efficient Ops
Efficient Ops
Feb 7, 2022 · Operations

Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips

This article explains how to design effective Prometheus metrics for various application types, choose appropriate vectors, labels, and buckets, and offers Grafana tricks for visualizing dimensions and linking tooltips, providing a comprehensive guide for robust observability.

Best PracticesGrafanaMonitoring
0 likes · 10 min read
Mastering Application Monitoring with Prometheus: Practical Metrics and Grafana Tips
Efficient Ops
Efficient Ops
Jan 23, 2022 · Operations

How to Monitor Nginx Logs with ELK: From Logstash Setup to Kibana Dashboard

This step‑by‑step guide shows how to collect, parse, and visualize Nginx access logs using the ELK stack, configure Logstash pipelines, set up Elasticsearch indices, proxy Kibana through Nginx, and secure access with HTTP basic authentication.

ELKKibanaLog Analysis
0 likes · 13 min read
How to Monitor Nginx Logs with ELK: From Logstash Setup to Kibana Dashboard
Efficient Ops
Efficient Ops
Jan 20, 2022 · Operations

Mastering Prometheus Metrics: Best Practices for Effective Monitoring

This article outlines practical guidelines for designing Prometheus metrics, covering how to define monitoring targets, choose appropriate vectors and labels, name metrics and labels correctly, select histogram buckets, and leverage Grafana features to visualize and troubleshoot data effectively.

GrafanaMonitoringObservability
0 likes · 11 min read
Mastering Prometheus Metrics: Best Practices for Effective Monitoring
Efficient Ops
Efficient Ops
Jan 12, 2022 · Cloud Native

Why Kubernetes Monitoring Differs from VM Metrics: CPU, Memory, Disk, Network

This article compares Kubernetes pod monitoring metrics with traditional KVM/VM metrics across CPU, memory, disk, and network, explaining the underlying reasons for the differences and offering guidance on interpreting the data for effective performance troubleshooting.

CPUKubernetesMemory
0 likes · 11 min read
Why Kubernetes Monitoring Differs from VM Metrics: CPU, Memory, Disk, Network
Efficient Ops
Efficient Ops
Dec 6, 2021 · Operations

How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023

The article summarizes a GOPS conference presentation by Dingmao Technology on AIOps scenario‑driven construction, detailing challenges, definition of scenarios, technical methods, roadmap planning, and future prospects, while showcasing practical examples and supporting technologies for intelligent IT operations.

AIOpsIT OperationsMonitoring
0 likes · 8 min read
How Scenario‑Based AIOps Transforms IT Operations: Insights from GOPS 2023
Efficient Ops
Efficient Ops
Oct 18, 2021 · Operations

Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Cloud Environments?

This article compares Prometheus and Zabbix, detailing their histories, architectures, data storage models, deployment complexity, community activity, and suitability for containerized versus traditional environments, helping readers decide which monitoring solution best fits their infrastructure needs.

MonitoringObservabilityPrometheus
0 likes · 8 min read
Prometheus vs Zabbix: Which Monitoring Tool Wins for Modern Cloud Environments?
Efficient Ops
Efficient Ops
Sep 26, 2021 · Cloud Native

How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs

This article analyzes why our Kubernetes clusters were constantly unstable—citing an erratic release process, missing monitoring, logging, documentation, and unclear request routing—and presents a comprehensive solution that includes a Kubernetes‑centric CI/CD pipeline, federated monitoring, centralized logging, a documentation hub, and integrated traffic management.

CI/CDDevOpsKubernetes
0 likes · 8 min read
How to Stabilize Your Kubernetes Clusters: CI/CD, Monitoring, Logging, and Docs
Efficient Ops
Efficient Ops
Sep 14, 2021 · Cloud Native

Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners

This comprehensive guide walks beginners through Kubernetes fundamentals, core components, key objects, storage, networking, resource management, security, cluster operations, backup, logging, monitoring, DevOps practices, and deep‑dive techniques, providing a clear learning path and practical tips for effective use.

BackupDevOpsKubernetes
0 likes · 16 min read
Master Kubernetes: A Step‑by‑Step Learning Roadmap for Beginners
Efficient Ops
Efficient Ops
Aug 17, 2021 · Operations

How to Build an Effective Monitoring System for Reliable Operations

This article outlines the goals, methods, core steps, tools, metrics, and alert handling strategies essential for designing a comprehensive monitoring system that ensures system reliability and continuous business operation.

AlertingMonitoringObservability
0 likes · 8 min read
How to Build an Effective Monitoring System for Reliable Operations
Efficient Ops
Efficient Ops
Aug 10, 2021 · Operations

From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform

Over two years, we built a monitoring system covering 200+ services and 700+ instances, evolving from ad‑hoc Nginx logs to a Prometheus‑based observability platform with unified dashboards, automated alerts, and lessons on metric selection, alert fatigue, and root‑cause analysis.

AlertingMonitoringObservability
0 likes · 9 min read
From Zero to Scalable Monitoring: Lessons from Building a 200‑Service Platform
Efficient Ops
Efficient Ops
Jul 5, 2021 · Operations

10 Essential Practices to Prevent DBA and Ops Disasters

Learn ten practical strategies—from safe change rollbacks and cautious destructive commands to robust backups, clear prompts, vigilant monitoring, and disciplined handovers—that help DBAs and operations engineers avoid costly system failures and maintain reliable production environments.

BackupLinuxMonitoring
0 likes · 6 min read
10 Essential Practices to Prevent DBA and Ops Disasters
Efficient Ops
Efficient Ops
May 12, 2021 · Operations

7 Ready‑to‑Use Python & Shell Scripts to Supercharge Your Ops

This article shares a curated collection of ready‑to‑run Python and Shell scripts—including Enterprise WeChat alerts, FTP and SSH clients, SaltStack and vCenter utilities, SSL certificate checks, weather notifications, SVN backups, Zabbix password monitoring, local YUM mirroring, and high‑load detection—complete with full source code and usage notes to help engineers automate routine tasks and boost operational efficiency.

AutomationMonitoringPython
0 likes · 30 min read
7 Ready‑to‑Use Python & Shell Scripts to Supercharge Your Ops
Efficient Ops
Efficient Ops
Mar 1, 2021 · Operations

Mastering Monitoring: From Fundamentals to Prometheus in Cloud‑Native Environments

This comprehensive guide explains the purpose, models, and methods of monitoring across the entire software lifecycle, compares health checks, logging, tracing, and metric collection, and details practical implementations using tools like ELK, SkyWalking, and Prometheus for cloud‑native operations.

MonitoringPrometheuscloud native
0 likes · 24 min read
Mastering Monitoring: From Fundamentals to Prometheus in Cloud‑Native Environments
Efficient Ops
Efficient Ops
Feb 22, 2021 · Operations

Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained

Prometheus alerts may not fire even when metrics exceed thresholds due to the ‘for’ pending duration, sparse sampling, and Grafana’s range queries, and this article explains the underlying mechanisms, illustrates common pitfalls with diagrams, and offers practical strategies to diagnose and resolve missing or unexpected alerts.

AlertingGrafanaMonitoring
0 likes · 6 min read
Why Does Prometheus Sometimes Fail to Trigger Alerts? Explained
Efficient Ops
Efficient Ops
Dec 7, 2020 · Operations

How to Diagnose and Resolve Common Java Server Performance Issues

This guide walks through systematic troubleshooting of Java server problems—including CPU spikes, memory leaks, disk bottlenecks, GC pauses, and network anomalies—by using tools such as jstack, jmap, jstat, vmstat, iostat, netstat, and ss to pinpoint root causes and apply targeted fixes.

JavaMonitoringTroubleshooting
0 likes · 22 min read
How to Diagnose and Resolve Common Java Server Performance Issues