Topic

monitoring

Collection size
1767 articles
Page 9 of 89
Spring Full-Stack Practical Cases
Spring Full-Stack Practical Cases
Dec 23, 2024 · Operations

Master Spring Boot 3 Monitoring: Actuator, Prometheus & Grafana in Practice

This article demonstrates how to use Spring Boot 3 Actuator together with Prometheus and Grafana to monitor JVM, Tomcat, database, Redis, and remote HTTP calls, providing real‑time metrics that help detect bottlenecks, optimize resources, and ensure stable performance under high load.

ActuatorGrafanaPrometheus
0 likes · 10 min read
Master Spring Boot 3 Monitoring: Actuator, Prometheus & Grafana in Practice
Old Zhao – Management Systems Only
Old Zhao – Management Systems Only
Apr 21, 2025 · R&D Management

Mastering Project Management: The 5 Essential Phases Every Team Needs

This guide explains why projects often fail, defines project management, and walks through the five essential phases—initiation, planning, execution, monitoring, and closure—providing practical steps, key metrics, and visual tools to help teams deliver results on time, within budget, and with quality.

Project Managementclosingexecution
0 likes · 7 min read
Mastering Project Management: The 5 Essential Phases Every Team Needs
Efficient Ops
Efficient Ops
May 11, 2025 · Operations

Essential Ops Engineer Toolkit: Must‑Have Tools for Monitoring, Automation, and Troubleshooting

This article presents a comprehensive, scenario‑driven toolbox for operations engineers, covering core SSH utilities, monitoring stacks, automation platforms, log management, network diagnostics, and emerging AI‑augmented practices to help teams select the right tools for modern infrastructure.

AutomationDevOpsInfrastructure
0 likes · 9 min read
Essential Ops Engineer Toolkit: Must‑Have Tools for Monitoring, Automation, and Troubleshooting
Efficient Ops
Efficient Ops
Apr 20, 2025 · Operations

How to Instantly Monitor Socket Health with the Lightweight 'dish' CLI Tool

This article introduces the lightweight command‑line tool dish, explains its core features such as one‑time socket health checks, remote configuration, concurrent testing, zero dependencies, multiple notification methods, caching, and provides installation steps, usage examples, and a comprehensive flag reference for efficient operations monitoring.

CLIGomonitoring
0 likes · 7 min read
How to Instantly Monitor Socket Health with the Lightweight 'dish' CLI Tool
Efficient Ops
Efficient Ops
Apr 21, 2025 · Operations

10 Must‑Know Shell Scripts to Boost Your Ops Efficiency

This guide presents ten practical shell script examples for operations engineers, covering file consistency checks, colored output functions, FTP downloads, package verification, service status monitoring, host reachability, resource utilization alerts, batch disk usage monitoring, website availability testing, and MySQL master‑slave synchronization, all with full code snippets.

AutomationLinuxmonitoring
0 likes · 13 min read
10 Must‑Know Shell Scripts to Boost Your Ops Efficiency
Efficient Ops
Efficient Ops
Apr 8, 2025 · Operations

Mastering Modern Ops: 100 Essential Knowledge Points for 2025

This comprehensive guide presents 100 essential operations engineering topics—from OS fundamentals and networking to automation, cloud‑native architectures, monitoring, security, databases, virtualization, and incident response—helping professionals stay current and boost system reliability in a rapidly evolving IT landscape.

Automationcloud computingmonitoring
0 likes · 12 min read
Mastering Modern Ops: 100 Essential Knowledge Points for 2025
Efficient Ops
Efficient Ops
Mar 23, 2025 · Operations

Essential Linux Log Files Every SRE Should Monitor

This article outlines the most important Linux log files under /var/log, explains what each records—from system and kernel messages to authentication, web server, database, and firewall events—and shows practical commands for inspecting them, helping SREs improve fault detection and system observability.

LinuxSREmonitoring
0 likes · 9 min read
Essential Linux Log Files Every SRE Should Monitor
Efficient Ops
Efficient Ops
Mar 9, 2025 · Artificial Intelligence

Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models

LLMOps, the end-to-end methodology for managing large language models, encompasses a curated set of development, deployment, monitoring, and local management tools—such as LangChain, vLLM, LangSmith, and Ollama—enabling practitioners to efficiently build, scale, and maintain AI applications.

AI DevelopmentLLMOpsModel Deployment
0 likes · 6 min read
Essential LLMOps Tools: Build, Deploy, Monitor, and Manage Large Language Models
Efficient Ops
Efficient Ops
Nov 12, 2024 · Operations

How to Build Robust Online Stability: Practices, Metrics, and Team Strategies

This article outlines a comprehensive approach to online stability, covering preventive measures, service governance, capacity planning, incident detection, multi‑dimensional monitoring, alerting, R&D efficiency improvements, team building, and practical guidelines for simplifying, standardizing, automating, and scaling stability initiatives across an organization.

AutomationStabilityincident response
0 likes · 15 min read
How to Build Robust Online Stability: Practices, Metrics, and Team Strategies
Efficient Ops
Efficient Ops
Nov 3, 2024 · Operations

Top 10 Essential Ops Tools Every Engineer Should Master

This article introduces ten indispensable tools for operations engineers, detailing each tool's functionality, ideal use cases, key advantages, and real‑world examples, plus code snippets and visual illustrations to help you choose the right solution for automation, monitoring, configuration, and container management.

AutomationDevOpscontainerization
0 likes · 9 min read
Top 10 Essential Ops Tools Every Engineer Should Master
Efficient Ops
Efficient Ops
Oct 29, 2024 · Operations

Master the Four Golden Signals: A Practical Guide to System Monitoring

Understanding system health is essential for reliable services, and this guide explains how to use powerful monitoring tools to collect, visualize, and alert on the four golden signals—latency, traffic, errors, and saturation—across servers, applications, and external dependencies, helping teams detect and resolve issues efficiently.

SREmetricsmonitoring
0 likes · 17 min read
Master the Four Golden Signals: A Practical Guide to System Monitoring
Efficient Ops
Efficient Ops
Sep 4, 2024 · Operations

Essential Bash Scripts for Linux Operations: Sync, Monitoring, and Automation

A comprehensive collection of Bash scripts demonstrates how to verify file consistency across servers, automate log rotation, monitor network traffic, manage users and passwords, detect service failures, and enforce security policies, providing practical solutions for everyday Linux system administration tasks.

AutomationLinuxScripts
0 likes · 25 min read
Essential Bash Scripts for Linux Operations: Sync, Monitoring, and Automation
Efficient Ops
Efficient Ops
Aug 18, 2024 · Operations

Essential Bash Scripts for Linux Ops: Monitoring, Deployment & Automation

A curated collection of ready‑to‑use Bash scripts that help you monitor MySQL replication, track directory changes, batch‑create users, detect website issues, execute remote commands, deploy LNMP stacks, check server resource usage, identify high‑load processes, and automate Java or PHP project releases.

AutomationDevOpsLinux
0 likes · 12 min read
Essential Bash Scripts for Linux Ops: Monitoring, Deployment & Automation
Efficient Ops
Efficient Ops
Aug 5, 2024 · Operations

Thanos vs VictoriaMetrics: Which Prometheus Long‑Term Storage Wins?

This article compares Thanos and VictoriaMetrics as Prometheus long‑term storage solutions, evaluating their architectures, write and read paths, reliability, data consistency, performance, scalability, high‑availability, and cost to help you choose the best fit for your monitoring stack.

CloudPrometheusThanos
0 likes · 17 min read
Thanos vs VictoriaMetrics: Which Prometheus Long‑Term Storage Wins?
Efficient Ops
Efficient Ops
Jul 28, 2024 · Operations

Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops

This guide outlines a comprehensive, step‑by‑step strategy for creating a highly available, secure, and scalable website—from buying and protecting multiple domains, configuring DNS and CDN, setting up image and database servers, to implementing monitoring, redundancy, high‑concurrency testing, and disaster‑recovery plans.

CDNHigh Availabilitymonitoring
0 likes · 13 min read
Building a Resilient, High‑Performance Website: Domains, CDN, Security & Ops
Efficient Ops
Efficient Ops
Mar 18, 2024 · Operations

How to Implement Fault Self‑Healing for Scalable Operations

This article explains why low‑disk alerts demand automation, outlines the concept of fault self‑healing versus manual response, and provides practical guidelines—including standards, monitoring dimensions, CMDB integration, script execution tools, and notification channels—to build a reliable self‑healing system for large‑scale environments.

AutomationCMDBDevOps
0 likes · 10 min read
How to Implement Fault Self‑Healing for Scalable Operations
Efficient Ops
Efficient Ops
Mar 13, 2024 · Operations

What Does an Operations Engineer Do? Skills, Tools, and Career Path

This article explains the role of an operations (运维) engineer, covering daily responsibilities, essential knowledge such as Linux and networking, common monitoring tools, and emerging career paths like DevOps, AIOps, and SRE, helping newcomers understand how to start and grow in the field.

DevOpsLinuxSRE
0 likes · 6 min read
What Does an Operations Engineer Do? Skills, Tools, and Career Path
Efficient Ops
Efficient Ops
Dec 3, 2023 · Artificial Intelligence

How to Build a Zabbix Expert Advisor with GPT‑4 in Minutes

This guide walks you through why GPT‑4 outperforms GPT‑3.5, shows step‑by‑step how to create a Zabbix expert consultant using the new GPTs feature, and explains advanced configuration, knowledge‑base feeding, testing, and future possibilities for AI‑enhanced monitoring.

AI assistantAutomationGPT-4
0 likes · 7 min read
How to Build a Zabbix Expert Advisor with GPT‑4 in Minutes
Efficient Ops
Efficient Ops
Sep 26, 2023 · Operations

Mastering Zabbix: From Installation to Advanced Monitoring and Automation

This comprehensive guide walks you through Zabbix monitoring concepts, reliability calculations, installation methods, web UI configuration, host and template management, custom monitoring, alert integration with OneAlert, Grafana visualization, distributed monitoring, SNMP support, and practical scripts for large‑scale server environments.

AlertingAutomationGrafana
0 likes · 28 min read
Mastering Zabbix: From Installation to Advanced Monitoring and Automation