Author

Ops Community

A leading IT operations community where professionals share and grow together.

197

Articles

Likes

888

Views

Comments

Latest from Ops Community

100 recent articles max

Ops Community

May 17, 2026 · Cloud Native

Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?

The article explains how traditional microservice architectures embed network concerns such as time‑outs, retries, circuit breaking, traffic monitoring and mTLS in application code, why this leads to code coupling, upgrade difficulty and duplicated effort, and how Istio’s sidecar‑based service mesh cleanly separates those concerns while providing traffic management, observability and security features.

EnvoyIstioKubernetes

0 likes · 30 min read

Istio Service Mesh Basics: What Is the Sidecar Pattern and Why Microservices Need It?

Ops Community

May 14, 2026 · Operations

Why Ping Failure Doesn’t Mean Network Failure – Essential Traceroute & MTR Tools Explained

The article explains why a failed ping does not always indicate a network outage, outlines common failure scenarios, and provides a step‑by‑step guide to using traceroute and mtr—including parameters, output interpretation, real‑world case studies, and a complete troubleshooting workflow for network engineers.

ICMPMTUNetwork Troubleshooting

0 likes · 59 min read

Why Ping Failure Doesn’t Mean Network Failure – Essential Traceroute & MTR Tools Explained

Ops Community

May 13, 2026 · Operations

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

This comprehensive guide walks Kubernetes operators through a step‑by‑step process for diagnosing node health problems—such as NotReady, MemoryPressure, DiskPressure, PIDPressure, and NetworkUnavailable—by examining node conditions, reviewing events, checking system resources, inspecting component logs, applying targeted fixes, and verifying recovery, all illustrated with real‑world commands and examples.

CNIDiskPressureKubernetes

0 likes · 44 min read

Kubernetes Node Failures: One‑Stop Guide to Diagnose and Fix Common Issues

Ops Community

May 11, 2026 · Operations

Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice

This comprehensive guide walks you through the fundamentals of Linux disk I/O performance, explains how to interpret key metrics such as IOPS, throughput and latency, and provides step‑by‑step instructions, scripts and configuration examples for diagnosing bottlenecks, optimizing filesystems, kernel parameters, application settings and storage layouts in production environments.

Disk I/OFilesystemLinux

0 likes · 60 min read

Production‑Grade Linux Disk I/O Tuning: From Theory to Hands‑On Practice

Ops Community

May 10, 2026 · Operations

Stop Manually SSHing Servers: Practical Ansible Playbook Examples

This article explains how Ansible automates repetitive operations such as bulk software installation, configuration changes, service restarts, application deployment, and log collection, guiding readers through installation, core concepts, inventory setup, common modules, multiple real‑world Playbooks, role organization, Vault security, troubleshooting, and best‑practice risk warnings.

DevOpsInfrastructure as Codeansible

0 likes · 31 min read

Stop Manually SSHing Servers: Practical Ansible Playbook Examples

Ops Community

May 9, 2026 · Operations

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

This article walks through building a simple, cost‑effective high‑availability solution for Nginx using Keepalived’s VRRP‑based VIP failover, covering environment setup, configuration of master and backup nodes, health‑check scripts, testing procedures, troubleshooting tips, and rollback steps.

LinuxNginxfailover

0 likes · 29 min read

Achieve Seamless Nginx High Availability with Keepalived: A Practical Guide

Ops Community

May 7, 2026 · Databases

How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies

This article walks operations engineers through the root causes of Redis data loss, explains the inner workings of RDB snapshots and AOF append‑only files, compares their trade‑offs, and provides concrete configuration, backup scripts, recovery procedures, and scenario‑based recommendations to keep data safe while maintaining performance.

AOFConfigurationPersistence

0 likes · 34 min read

How to Prevent Redis Data Loss: In‑Depth RDB and AOF Backup Strategies

Ops Community

May 6, 2026 · Operations

Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database

When a website’s response time jumped from 200 ms to over 10 seconds, this guide walks through a layered investigation—from confirming the scope, checking Nginx and upstream health, analyzing application logs, inspecting MySQL processes, slow queries, and locks, to examining server CPU, memory, disk I/O, and network—providing concrete commands, expected outputs, and root‑cause patterns for effective troubleshooting and preventive monitoring.

LinuxMySQLNginx

0 likes · 34 min read

Step‑by‑Step Debugging of a Slow Website: From Nginx to the Database

Ops Community

May 4, 2026 · Information Security

Investigating and Securing a Server After a Suspicious Login

When a production server shows unexpected high CPU usage and unknown login activity, this guide walks Linux ops engineers through confirming intrusion, stopping the attacker, tracing the attack path, removing backdoors, restoring system integrity, and applying hardening measures to prevent future breaches.

ForensicsHardeningLinux

0 likes · 27 min read

Investigating and Securing a Server After a Suspicious Login

Ops Community

May 3, 2026 · Operations

How to Diagnose Slow Server Responses: Full‑Scope CPU, Memory, Disk & Network Analysis

This guide walks Linux operators through a systematic, four‑dimensional investigation of server slowdown—covering CPU, memory, disk I/O, and network—using concrete commands, diagnostic scripts, real‑world scenarios, and step‑by‑step remediation strategies to pinpoint and resolve performance bottlenecks.

CPUDisk I/OLinux

0 likes · 32 min read

How to Diagnose Slow Server Responses: Full‑Scope CPU, Memory, Disk & Network Analysis