Ops Community
Author

Ops Community

A leading IT operations community where professionals share and grow together.

197
Articles
0
Likes
888
Views
0
Comments
Recent Articles

Latest from Ops Community

100 recent articles max
Ops Community
Ops Community
May 17, 2026 · Cloud Native

Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?

The article explains how traditional microservice architectures embed network concerns such as time‑outs, retries, circuit breaking, traffic monitoring and mTLS in application code, why this leads to code coupling, upgrade difficulty and duplicated effort, and how Istio’s sidecar‑based service mesh cleanly separates those concerns while providing traffic management, observability and security features.

EnvoyIstioKubernetes
0 likes · 30 min read
Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?
Ops Community
Ops Community
May 14, 2026 · Operations

Why Ping Failure Doesn’t Mean Network Failure – Essential Traceroute & MTR Tools Explained

The article explains why a failed ping does not always indicate a network outage, outlines common failure scenarios, and provides a step‑by‑step guide to using traceroute and mtr—including parameters, output interpretation, real‑world case studies, and a complete troubleshooting workflow for network engineers.

ICMPMTUNetwork Troubleshooting
0 likes · 59 min read
Why Ping Failure Doesn’t Mean Network Failure – Essential Traceroute & MTR Tools Explained
Ops Community
Ops Community
May 13, 2026 · Operations

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

This comprehensive guide walks Kubernetes operators through a step‑by‑step process for diagnosing node health problems—such as NotReady, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable—by examining node conditions, reviewing events, checking system resources, inspecting component logs, applying targeted fixes, and verifying recovery, all illustrated with real‑world commands and examples.

CNIDiskPressureKubernetes
0 likes · 44 min read
Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues
Ops Community
Ops Community
May 11, 2026 · Operations

Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice

This comprehensive guide walks you through the fundamentals of Linux disk I/O performance, explains how to interpret key metrics such as IOPS, throughput and latency, and provides step‑by‑step instructions, scripts and configuration examples for diagnosing bottlenecks, optimizing filesystems, kernel parameters, application settings and storage layouts in production environments.

Disk I/OFilesystemLinux
0 likes · 60 min read
Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice
Ops Community
Ops Community
May 10, 2026 · Operations

Stop Manually SSHing Servers: Practical Ansible Playbook Examples

This article explains how Ansible automates repetitive operations such as bulk software installation, configuration changes, service restarts, application deployment, and log collection, guiding readers through installation, core concepts, inventory setup, common modules, multiple real‑world Playbooks, role organization, Vault security, troubleshooting, and best‑practice risk warnings.

DevOpsInfrastructure as Codeansible
0 likes · 31 min read
Stop Manually SSHing Servers: Practical Ansible Playbook Examples
Ops Community
Ops Community
May 9, 2026 · Operations

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

This article walks through building a simple, cost‑effective high‑availability solution for Nginx using Keepalived’s VRRP‑based VIP failover, covering environment setup, configuration of master and backup nodes, health‑check scripts, testing procedures, troubleshooting tips, and rollback steps.

LinuxNginxfailover
0 likes · 29 min read
Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide
Ops Community
Ops Community
May 7, 2026 · Databases

How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies

This article walks operations engineers through the root causes of Redis data loss, explains the inner workings of RDB snapshots and AOF append‑only files, compares their trade‑offs, and provides concrete configuration, backup scripts, recovery procedures, and scenario‑based recommendations to keep data safe while maintaining performance.

AOFConfigurationPersistence
0 likes · 34 min read
How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies
Ops Community
Ops Community
May 6, 2026 · Operations

Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database

When a website’s response time jumped from 200 ms to over 10 seconds, this guide walks through a layered investigation—from confirming the scope, checking Nginx and upstream health, analyzing application logs, inspecting MySQL processes, slow queries, and locks, to examining server CPU, memory, disk I/O, and network—providing concrete commands, expected outputs, and root‑cause patterns for effective troubleshooting and preventive monitoring.

LinuxMySQLNginx
0 likes · 34 min read
Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database
Ops Community
Ops Community
May 4, 2026 · Information Security

Investigating and Securing a Server After a Suspicious Login

When a production server shows unexpected high CPU usage and unknown login activity, this guide walks Linux ops engineers through confirming intrusion, stopping the attacker, tracing the attack path, removing backdoors, restoring system integrity, and applying hardening measures to prevent future breaches.

ForensicsHardeningLinux
0 likes · 27 min read
Investigating and Securing a Server After a Suspicious Login