Raymond Ops
Author

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

607
Articles
0
Likes
2.1k
Views
0
Comments
Recent Articles

Latest from Raymond Ops

100 recent articles max
Raymond Ops
Raymond Ops
Apr 25, 2026 · Databases

How to Reduce MySQL Master‑Slave Replication Lag from 30 seconds to Milliseconds

This article walks through the root causes of MySQL master‑slave replication delay, demonstrates step‑by‑step diagnostics using SHOW SLAVE STATUS, pt‑heartbeat, and binlog comparisons, and provides concrete configuration changes, query rewrites, hardware upgrades, and monitoring scripts that can shrink lag from dozens of seconds to sub‑millisecond levels.

LatencyMonitoringMySQL
0 likes · 23 min read
How to Reduce MySQL Master‑Slave Replication Lag from 30 seconds to Milliseconds
Raymond Ops
Raymond Ops
Apr 23, 2026 · Operations

Advanced Nginx Load Balancing: How to Choose and Tune Layer 4 vs Layer 7

This guide walks through the differences between 4‑layer (TCP) and 7‑layer (HTTP) load balancing in Nginx, explains when to use each, and provides step‑by‑step configuration examples, health‑check setups, performance tuning, SSL handling, WebSocket support, and common pitfalls.

ConfigurationHealth CheckLayer 4
0 likes · 25 min read
Advanced Nginx Load Balancing: How to Choose and Tune Layer 4 vs Layer 7
Raymond Ops
Raymond Ops
Apr 22, 2026 · Operations

How Prometheus Recording Rules Can Reduce Alert Noise by 70%

This guide explains how to use Prometheus Recording Rules to pre‑compute, aggregate, and smooth metrics in large‑scale microservice environments, cutting daily alert noise by up to 70% through hierarchical alert design, practical examples, and best‑practice recommendations.

Alert Noise ReductionDevOpsKubernetes
0 likes · 22 min read
How Prometheus Recording Rules Can Reduce Alert Noise by 70%
Raymond Ops
Raymond Ops
Apr 20, 2026 · Operations

How to Build a Standardized SRE On‑Call Process: From Alert Grading to Handoff Templates

This article presents a complete SRE on‑call handbook that defines alert severity levels, provides concrete Prometheus Alertmanager configurations, outlines a step‑by‑step response flow, details war‑room roles, escalation paths, handoff checklists, post‑mortem procedures, and dozens of ready‑to‑use templates to reduce MTTR and improve reliability.

Alert ManagementOn-CallRunbook
0 likes · 27 min read
How to Build a Standardized SRE On‑Call Process: From Alert Grading to Handoff Templates
Raymond Ops
Raymond Ops
Apr 19, 2026 · Cloud Native

How to Double K8s Ingress Performance: Nginx vs Envoy Gateway Tuning Guide

This article walks through a real‑world performance bottleneck on a high‑traffic e‑commerce platform, explains step‑by‑step deep tuning of Nginx Ingress Controller, compares it with Envoy Gateway, and provides concrete configurations, benchmark results, monitoring rules, and best‑practice recommendations for Kubernetes Ingress optimization.

EnvoyIngressKubernetes
0 likes · 27 min read
How to Double K8s Ingress Performance: Nginx vs Envoy Gateway Tuning Guide
Raymond Ops
Raymond Ops
Apr 18, 2026 · Operations

How to Build a Lightweight Log Platform with Grafana and Loki in 3 Simple Steps

This guide walks you through replacing a heavyweight ELK stack with a minimal Grafana‑Loki logging solution, covering environment requirements, installation of Loki and Promtail, configuration details, best‑practice tips, troubleshooting, and backup strategies for reliable log aggregation.

GrafanaLokiPromtail
0 likes · 25 min read
How to Build a Lightweight Log Platform with Grafana and Loki in 3 Simple Steps
Raymond Ops
Raymond Ops
Apr 18, 2026 · Operations

Rapid CPU Spike Diagnosis: Resolve High CPU Usage in Under 5 Minutes

This guide presents a step‑by‑step, standardized process for detecting, analyzing, and fixing sudden CPU usage spikes on Linux servers, covering preparation, quick identification, deep thread‑level investigation, stack and system‑call analysis, flame‑graph generation, emergency mitigation, and best‑practice recommendations.

CPULinuxMonitoring
0 likes · 21 min read
Rapid CPU Spike Diagnosis: Resolve High CPU Usage in Under 5 Minutes