Ops Community
Author

Ops Community

A leading IT operations community where professionals share and grow together.

196
Articles
0
Likes
862
Views
0
Comments
Recent Articles

Latest from Ops Community

100 recent articles max
Ops Community
Ops Community
May 13, 2026 · Operations

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

This comprehensive guide walks Kubernetes operators through a step‑by‑step process for diagnosing node health problems—such as NotReady, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable—by examining node conditions, reviewing events, checking system resources, inspecting component logs, applying targeted fixes, and verifying recovery, all illustrated with real‑world commands and examples.

CNIDiskPressureKubernetes
0 likes · 44 min read
Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues
Ops Community
Ops Community
May 11, 2026 · Operations

Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice

This comprehensive guide walks you through the fundamentals of Linux disk I/O performance, explains how to interpret key metrics such as IOPS, throughput and latency, and provides step‑by‑step instructions, scripts and configuration examples for diagnosing bottlenecks, optimizing filesystems, kernel parameters, application settings and storage layouts in production environments.

FilesystemLinuxdisk I/O
0 likes · 60 min read
Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice
Ops Community
Ops Community
May 10, 2026 · Operations

Stop Manually SSHing Servers: Practical Ansible Playbook Examples

This article explains how Ansible automates repetitive operations such as bulk software installation, configuration changes, service restarts, application deployment, and log collection, guiding readers through installation, core concepts, inventory setup, common modules, multiple real‑world Playbooks, role organization, Vault security, troubleshooting, and best‑practice risk warnings.

AnsibleAutomationConfiguration Management
0 likes · 31 min read
Stop Manually SSHing Servers: Practical Ansible Playbook Examples
Ops Community
Ops Community
May 9, 2026 · Operations

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

This article walks through building a simple, cost‑effective high‑availability solution for Nginx using Keepalived’s VRRP‑based VIP failover, covering environment setup, configuration of master and backup nodes, health‑check scripts, testing procedures, troubleshooting tips, and rollback steps.

High AvailabilityLinuxNginx
0 likes · 29 min read
Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide
Ops Community
Ops Community
May 7, 2026 · Databases

How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies

This article walks operations engineers through the root causes of Redis data loss, explains the inner workings of RDB snapshots and AOF append‑only files, compares their trade‑offs, and provides concrete configuration, backup scripts, recovery procedures, and scenario‑based recommendations to keep data safe while maintaining performance.

AOFBackupPersistence
0 likes · 34 min read
How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies
Ops Community
Ops Community
May 6, 2026 · Operations

Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database

When a website’s response time jumped from 200 ms to over 10 seconds, this guide walks through a layered investigation—from confirming the scope, checking Nginx and upstream health, analyzing application logs, inspecting MySQL processes, slow queries, and locks, to examining server CPU, memory, disk I/O, and network—providing concrete commands, expected outputs, and root‑cause patterns for effective troubleshooting and preventive monitoring.

LinuxMySQLNginx
0 likes · 34 min read
Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database
Ops Community
Ops Community
May 4, 2026 · Information Security

Investigating and Securing a Server After a Suspicious Login

When a production server shows unexpected high CPU usage and unknown login activity, this guide walks Linux ops engineers through confirming intrusion, stopping the attacker, tracing the attack path, removing backdoors, restoring system integrity, and applying hardening measures to prevent future breaches.

ForensicsHardeningLinux
0 likes · 27 min read
Investigating and Securing a Server After a Suspicious Login
Ops Community
Ops Community
May 2, 2026 · Databases

How to Completely Resolve MySQL CPU Spikes: Real‑World Fault Replay and Optimization Guide

This article walks you through a systematic, step‑by‑step process for diagnosing and fixing MySQL CPU usage spikes—from identifying the symptoms and gathering system metrics, to pinpointing problematic queries, analyzing locks and buffers, applying index and configuration tweaks, and validating the performance gains with real‑world examples and command‑line tools.

CPUDatabaseIndex Optimization
0 likes · 44 min read
How to Completely Resolve MySQL CPU Spikes: Real‑World Fault Replay and Optimization Guide