Tagged articles
577 articles
Page 2 of 6
MaGe Linux Operations
MaGe Linux Operations
Oct 16, 2025 · Operations

Essential Linux Performance Troubleshooting Cheat Sheet: From CPU to Network

This guide provides a systematic Linux performance troubleshooting cheat sheet covering CPU, memory, disk I/O, network, processes, system calls, logs, and kernel parameters, complete with over 20 practical commands, real‑world case studies, best‑practice checklists, and an FAQ to help ops engineers quickly pinpoint and resolve performance bottlenecks.

Linuxtroubleshooting
0 likes · 22 min read
Essential Linux Performance Troubleshooting Cheat Sheet: From CPU to Network
dbaplus Community
dbaplus Community
Oct 13, 2025 · Cloud Native

10 Common Kubernetes Deployment Errors and How to Fix Them

When Kubernetes deployments fail, most issues stem from misconfigurations, image problems, or resource constraints, and this guide explains the ten most frequent errors, detailed troubleshooting commands, a generic debugging framework, and proactive practices to prevent future failures.

Cloud NativeContainersDeployment
0 likes · 14 min read
10 Common Kubernetes Deployment Errors and How to Fix Them
DataFunSummit
DataFunSummit
Oct 7, 2025 · Artificial Intelligence

Bilibili’s AI‑Powered Assistant: Solving Big Data Task Failures with LLMs

This article details Bilibili's implementation of a large‑language‑model‑driven intelligent assistant that helps engineers diagnose and resolve massive offline and real‑time data‑processing failures, describing the platform’s five‑layer architecture, common failure and slowdown causes, and the need for AI‑powered troubleshooting support.

BilibiliIntelligent Assistantbig data platform
0 likes · 4 min read
Bilibili’s AI‑Powered Assistant: Solving Big Data Task Failures with LLMs
Ops Community
Ops Community
Oct 2, 2025 · Operations

How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist

This article provides a comprehensive, step‑by‑step checklist for diagnosing and resolving Nginx 502 Bad Gateway errors, covering backend service verification, configuration checks, log analysis, resource monitoring, network troubleshooting, special scenarios, and long‑term preventive measures.

502BackendBad Gateway
0 likes · 25 min read
How to Fix Nginx 502 Bad Gateway Errors: A 90% Success Checklist
Ops Community
Ops Community
Oct 1, 2025 · Databases

Why Did Redis Memory Spike 10×? Uncover the Hidden Config Mistake

A sudden Redis memory surge from 2 GB to 20 GB was traced to a misconfigured list-compress-depth parameter, revealing how uncompressed lists and queue backlogs can cause ten‑fold memory growth, and outlining step‑by‑step diagnostics, compression fixes, and long‑term optimization strategies.

ConfigurationList CompressionMemory Management
0 likes · 24 min read
Why Did Redis Memory Spike 10×? Uncover the Hidden Config Mistake
MaGe Linux Operations
MaGe Linux Operations
Sep 30, 2025 · Cloud Native

How I Cut Kubernetes Troubleshooting Time from 30 Minutes to 3 Minutes

This article presents a complete, step‑by‑step method for reducing average Kubernetes fault‑diagnosis time from half an hour to under three minutes, covering the root causes of slow manual debugging, a one‑click diagnostic script, efficient kubectl shortcuts, visual tools, log aggregation, automated response workflows, and real‑world case studies.

AutomationDevOpscloud‑native
0 likes · 50 min read
How I Cut Kubernetes Troubleshooting Time from 30 Minutes to 3 Minutes
Selected Java Interview Questions
Selected Java Interview Questions
Sep 22, 2025 · Backend Development

Quickly Diagnose Spring Boot + Nacos + MySQL Microservice Failures

This guide provides a step‑by‑step troubleshooting workflow for Spring Boot microservices using Nacos as a config/registry and MySQL as the database, covering log inspection, process checks, port listening, network connectivity, configuration validation, database connectivity, system resources, startup commands, and an optional one‑click diagnostic script.

LinuxNacosSpring Boot
0 likes · 9 min read
Quickly Diagnose Spring Boot + Nacos + MySQL Microservice Failures
MaGe Linux Operations
MaGe Linux Operations
Sep 15, 2025 · Operations

Master Nginx Troubleshooting: From 502 Errors to Performance Optimization

This article walks you through ten real-world Nginx failure cases—covering 502 errors, SSL expiration, high concurrency bottlenecks, cache misconfigurations, log rotation issues, load‑balancing mistakes, security gaps, reverse‑proxy quirks, URL rewrite conflicts, and monitoring—while teaching a systematic diagnostic methodology for ops engineers.

502 errorDevOpsOperations
0 likes · 27 min read
Master Nginx Troubleshooting: From 502 Errors to Performance Optimization
MaGe Linux Operations
MaGe Linux Operations
Sep 11, 2025 · Operations

Mastering Kubernetes Pod Lifecycle: Real‑World Troubleshooting Techniques

This comprehensive guide dissects every stage of the Kubernetes Pod lifecycle, explains underlying mechanisms, and equips operators with practical debugging commands, scripts, and best‑practice configurations to swiftly resolve common production issues such as pending pods, crash loops, slow startups, and network failures.

Cloud NativeKubernetesPod Lifecycle
0 likes · 21 min read
Mastering Kubernetes Pod Lifecycle: Real‑World Troubleshooting Techniques
Ops Community
Ops Community
Sep 10, 2025 · Operations

Master Linux Network Routing & Forwarding: From Theory to Real-World Practice

This comprehensive guide walks you through Linux routing fundamentals, static and dynamic route configuration, policy routing, IP forwarding, NAT, troubleshooting, performance tuning, security hardening, and container networking, equipping operations engineers with the skills to design, optimize, and secure complex network infrastructures.

IP forwardingLinuxNAT
0 likes · 23 min read
Master Linux Network Routing & Forwarding: From Theory to Real-World Practice
ITPUB
ITPUB
Sep 8, 2025 · Operations

12 Essential grep Command Combinations to Supercharge Log Analysis

This guide presents twelve practical grep command-line patterns—including case‑insensitive search, line‑number highlighting, keyword counting, multi‑keyword regex, context display, real‑time filtering, and integration with find—each illustrated with exact syntax and brief explanations to help Linux administrators and developers troubleshoot logs more efficiently.

GrepLinuxcommand-line
0 likes · 5 min read
12 Essential grep Command Combinations to Supercharge Log Analysis
Architect's Must-Have
Architect's Must-Have
Sep 3, 2025 · Operations

How to Resolve Common Jenkins Compatibility and Configuration Issues

This guide walks through fixing Performance plugin incompatibility, adjusting Jenkins CSP security, customizing access paths, handling git clone timeouts, fixing batch command failures, updating vulnerable jars, running JNLP files on Windows nodes, disabling CSRF, tuning JVM memory, and optimizing disk usage to keep Jenkins stable and efficient.

ConfigurationDevOpsJenkins
0 likes · 12 min read
How to Resolve Common Jenkins Compatibility and Configuration Issues
Ops Community
Ops Community
Sep 2, 2025 · Information Security

Mastering SELinux in Production: A Complete Security Configuration Guide

This comprehensive guide walks you through SELinux fundamentals, core concepts, mode differences, security contexts, real‑world configuration examples for web and database services, boolean management, troubleshooting techniques, performance tuning, and enterprise‑grade best practices to turn SELinux into a reliable production‑level security guardian.

Linux securitySELinuxSystem Hardening
0 likes · 16 min read
Mastering SELinux in Production: A Complete Security Configuration Guide
Raymond Ops
Raymond Ops
Aug 25, 2025 · Operations

How to Resolve Kubernetes Certificate Expiration Errors with kubeadm

When a Kubernetes cluster suddenly fails to respond with an x509 certificate expiration error, this guide walks you through using kubeadm commands to renew all certificates, update kubeconfig files, restart kubelet, and verify the new expiration dates, ensuring the cluster returns to normal operation.

CertificateOpskubeadm
0 likes · 8 min read
How to Resolve Kubernetes Certificate Expiration Errors with kubeadm
Efficient Ops
Efficient Ops
Aug 24, 2025 · Operations

Master tcpdump: Essential Commands for Network Packet Capture

This guide introduces tcpdump, a powerful network packet capture tool, explains its filtering capabilities with logical operators, and provides numerous practical examples—from capturing traffic on specific interfaces and hosts to filtering by ports, protocols, and saving captures—helping users troubleshoot network issues efficiently.

LinuxNetwork MonitoringPacket Capture
0 likes · 6 min read
Master tcpdump: Essential Commands for Network Packet Capture
MaGe Linux Operations
MaGe Linux Operations
Aug 24, 2025 · Operations

Master Production Incident Troubleshooting: SEAL Methodology & Essential Ops Toolbox

This comprehensive guide shares a veteran ops engineer's real‑world troubleshooting mindset, the SEAL framework, a curated toolbox of monitoring, logging, performance, and network utilities, detailed case studies, incident‑response grading, automation scripts, and future‑ready AIOps practices for keeping production systems stable.

AutomationSREincident response
0 likes · 19 min read
Master Production Incident Troubleshooting: SEAL Methodology & Essential Ops Toolbox
Tech Freedom Circle
Tech Freedom Circle
Aug 5, 2025 · Backend Development

How to Diagnose and Fix Sudden Redis Slowdowns: A Complete Five‑Step Guide

This article provides a systematic, step‑by‑step methodology for identifying the root causes of Redis performance degradation—including big keys, slow queries, expiration spikes, memory limits, fork latency, AOF flushing, memory fragmentation, swap usage, huge pages, and CPU binding—and offers immediate mitigation tactics as well as long‑term architectural solutions to restore and maintain high throughput.

BackendCacheMemory
0 likes · 50 min read
How to Diagnose and Fix Sudden Redis Slowdowns: A Complete Five‑Step Guide
Raymond Ops
Raymond Ops
Jul 22, 2025 · Operations

Master tcpdump: Essential Commands for Precise Network Packet Capture

This guide introduces tcpdump, a powerful network packet capture tool, explaining its basic usage, filtering options, interface selection, logical expressions, and advanced examples such as capturing specific hosts, ports, protocols, limiting packet counts, and saving captures to files for detailed analysis.

Linuxnetwork capturetcpdump
0 likes · 8 min read
Master tcpdump: Essential Commands for Precise Network Packet Capture
Mingyi World Elasticsearch
Mingyi World Elasticsearch
Jul 11, 2025 · Operations

Logstash 9.x vs Earlier Versions: Key Differences, Common Errors, and Fixes

This article compares Logstash 9.x with previous releases, shows a working 9.x configuration, explains why root execution is blocked, details the deprecation of the cacert setting in favor of ssl_certificate_authorities, and provides step‑by‑step troubleshooting tips—including permission checks and the --config.test_and_exit flag—to resolve typical startup and data‑ingestion issues.

ConfigurationElasticsearchLogstash
0 likes · 8 min read
Logstash 9.x vs Earlier Versions: Key Differences, Common Errors, and Fixes
Ops Community
Ops Community
Jul 8, 2025 · Operations

Boost Your Ops Efficiency 10× with Essential Linux Network Tools

This article introduces the most important Linux network testing utilities—covering basic connectivity, routing analysis, DNS resolution, port monitoring, bandwidth measurement, and packet capture—providing a comprehensive guide that helps operations engineers diagnose and resolve network issues ten times faster.

BashLinuxiperf3
0 likes · 19 min read
Boost Your Ops Efficiency 10× with Essential Linux Network Tools
Ops Community
Ops Community
Jun 21, 2025 · Operations

Master Ceph: The Ultimate Distributed Storage Operations Handbook

This guide introduces Ceph as a leading open‑source distributed storage solution, explains why enterprises choose it for scalable data platforms, and provides a comprehensive operations manual covering common tasks, troubleshooting, and advanced management to help storage engineers efficiently run Ceph clusters.

CephStorage Managementdistributed storage
0 likes · 3 min read
Master Ceph: The Ultimate Distributed Storage Operations Handbook
Raymond Ops
Raymond Ops
Jun 17, 2025 · Operations

Diagnosing Disk Space Issues on Linux with df and du Commands

This article walks through troubleshooting a failed deployment caused by a full disk, showing how to use df -h to check overall disk usage and various du options (including --max-depth and -sh) to pinpoint large directories and resolve the issue.

LinuxOperationsdf
0 likes · 4 min read
Diagnosing Disk Space Issues on Linux with df and du Commands
MaGe Linux Operations
MaGe Linux Operations
Jun 13, 2025 · Cloud Native

Mastering Nginx Troubleshooting in Cloud‑Native Environments: A Step‑by‑Step Guide

Learn how to systematically diagnose and resolve Nginx failures in cloud‑native deployments by understanding core concepts, applying a step‑by‑step algorithm, analyzing logs, configurations, and system metrics, and using practical Kubernetes examples, code snippets, and performance models to ensure reliable service operation.

Cloud NativeDevOpsKubernetes
0 likes · 31 min read
Mastering Nginx Troubleshooting in Cloud‑Native Environments: A Step‑by‑Step Guide
Liangxu Linux
Liangxu Linux
Jun 11, 2025 · Operations

Why Is Your Linux Server Dropping Packets? A Step‑by‑Step Diagnosis

This article walks through a systematic Linux network packet‑loss investigation, covering every protocol layer from the NIC to the application, analyzing ethtool, netstat, tc, iptables rules, MTU settings, and finally applying fixes to restore reliable connectivity.

MTUPacket Lossiptables
0 likes · 12 min read
Why Is Your Linux Server Dropping Packets? A Step‑by‑Step Diagnosis
Lin is Dream
Lin is Dream
Jun 5, 2025 · Fundamentals

Master IntelliJ IDEA Debugging: Advanced Tips Every Java Developer Needs

Learn how to leverage IntelliJ IDEA's powerful debugging features—including step commands, conditional breakpoints, thread inspection, and expression evaluation—plus troubleshoot common startup errors and automatically generate serialVersionUID, providing essential techniques for Java developers to debug efficiently and resolve IDE issues.

DebuggingIDEIntelliJ IDEA
0 likes · 7 min read
Master IntelliJ IDEA Debugging: Advanced Tips Every Java Developer Needs
Practical DevOps Architecture
Practical DevOps Architecture
May 29, 2025 · Databases

Quick Solutions for MySQL Table Locks

This guide outlines a step‑by‑step method to diagnose and release MySQL table locks by checking open tables, inspecting running processes, querying InnoDB transaction and lock tables, and generating KILL statements to terminate blocking sessions.

Database AdministrationSQLmysql
0 likes · 3 min read
Quick Solutions for MySQL Table Locks
ITPUB
ITPUB
May 12, 2025 · Operations

What Hidden Challenges Do Desktop Support Heroes Face?

A seasoned desktop support veteran shares the untold struggles, quirky philosophies, time‑saving calculations, and memorable incidents that reveal how sysadmins silently keep an organization running while juggling endless reboot debates, hardware mysteries, and unexpected human drama.

IT supportSysadminhardware maintenance
0 likes · 7 min read
What Hidden Challenges Do Desktop Support Heroes Face?
Liangxu Linux
Liangxu Linux
May 7, 2025 · Fundamentals

Why Embedded Development Feels Hard and How to Fix Common Bugs

This article explains why many consider embedded development difficult, then walks through systematic steps for reproducing, locating, analyzing, and resolving typical embedded bugs—including logging, online debugging, version rollback, binary commenting, register snapshots, and regression testing—to help engineers troubleshoot effectively.

Cortex-MDebuggingfirmware
0 likes · 12 min read
Why Embedded Development Feels Hard and How to Fix Common Bugs
Aikesheng Open Source Community
Aikesheng Open Source Community
May 6, 2025 · Databases

Using GDB to Adjust MySQL max_connections Without Restart

This article explains how to troubleshoot and resolve the MySQL "Too many connections" error by using GDB to modify the max_connections parameter on a running MySQL 5.7 instance without restarting, including step‑by‑step commands, sysbench load testing, and two practical methods.

Database TuningSysbenchgdb
0 likes · 9 min read
Using GDB to Adjust MySQL max_connections Without Restart
dbaplus Community
dbaplus Community
Apr 30, 2025 · Databases

Top 10 MySQL Errors and How to Fix Them: Practical Solutions for DBAs

This article compiles the ten most common MySQL error scenarios—from connection limits and replication conflicts to installation failures, password resets, truncate side‑effects, configuration pitfalls, charset issues, binlog formats, timeout problems, and file‑handle limits—offering clear diagnostic steps and concrete commands to resolve each case.

Database ErrorsReplicationmysql
0 likes · 16 min read
Top 10 MySQL Errors and How to Fix Them: Practical Solutions for DBAs
dbaplus Community
dbaplus Community
Apr 28, 2025 · Operations

20 Common Ops Failures and How to Diagnose & Fix Them

This article compiles twenty frequent operational incidents—from server inaccessibility and database connection errors to disk‑space exhaustion, high CPU usage, memory leaks, network latency, DNS failures, service crashes, file‑system corruption, update problems, permission misconfigurations, web‑server and email issues, backup failures, load‑balancing anomalies, firewall rule mistakes, SSH connection problems, database performance degradation, dependency gaps, and virtual‑machine faults—detailing their symptoms, step‑by‑step troubleshooting procedures, and concrete remediation actions.

FixesOperationsServer
0 likes · 15 min read
20 Common Ops Failures and How to Diagnose & Fix Them
Zhuanzhuan Tech
Zhuanzhuan Tech
Apr 23, 2025 · Databases

Quick 3‑Step Guide to Locate and Analyze MySQL InnoDB Deadlocks

This article explains how to find the MySQL deadlock log, parse its contents to determine the time, order, and affected rows, identify the lock types and root cause, and provides extended examples of special locking scenarios, all illustrated with real‑world SQL and code snippets.

InnoDBdatabasedeadlock
0 likes · 15 min read
Quick 3‑Step Guide to Locate and Analyze MySQL InnoDB Deadlocks
Sohu Tech Products
Sohu Tech Products
Apr 9, 2025 · Databases

Six Critical MySQL Index Pitfalls and How to Fix Them

This article analyzes six common MySQL query performance traps—type conversion, function usage, left‑most prefix, implicit charset conversion, left‑most match, and optimizer mis‑selection—illustrates each with real‑world SQL examples, explains why they degrade performance, and provides concrete remediation steps and verification tools.

SQLdatabaseindexing
0 likes · 5 min read
Six Critical MySQL Index Pitfalls and How to Fix Them
Raymond Ops
Raymond Ops
Apr 7, 2025 · Operations

How to Deploy Prometheus on Kubernetes and Resolve Alertmanager Port Issues

This guide explains what Prometheus monitoring is, walks through downloading the correct version for a Kubernetes cluster, customizing alert rules, deploying and cleaning up Prometheus, and troubleshooting common Alertmanager connection problems by checking DNS and network configurations.

AlertmanagerPrometheusmonitoring
0 likes · 9 min read
How to Deploy Prometheus on Kubernetes and Resolve Alertmanager Port Issues
Raymond Ops
Raymond Ops
Mar 27, 2025 · Operations

How to Install and Configure RabbitMQ on Linux: Step‑by‑Step Guide

This guide explains how to install Erlang, download and compile RabbitMQ 3.0.4, start the server in detached mode, verify its status, and troubleshoot common issues such as port conflicts on CentOS 6, providing complete command‑line instructions and configuration tips.

ErlangInstallationLinux
0 likes · 6 min read
How to Install and Configure RabbitMQ on Linux: Step‑by‑Step Guide
Efficient Ops
Efficient Ops
Mar 23, 2025 · Operations

Essential Linux Log Files Every SRE Should Monitor

This article outlines the most important Linux log files under /var/log, explains what each records—from system and kernel messages to authentication, web server, database, and firewall events—and shows practical commands for inspecting them, helping SREs improve fault detection and system observability.

system logstroubleshooting
0 likes · 9 min read
Essential Linux Log Files Every SRE Should Monitor
Practical DevOps Architecture
Practical DevOps Architecture
Mar 7, 2025 · Cloud Native

Kubernetes DNS Resolution Issues and Troubleshooting Guide

This article explains common Kubernetes DNS resolution failures, both for external domains and internal service discovery addresses, and provides a step‑by‑step troubleshooting workflow that includes checking CoreDNS, examining resolv.conf, adjusting DNS settings, and recreating CoreDNS when necessary.

ClusterCoreDNSDNS
0 likes · 6 min read
Kubernetes DNS Resolution Issues and Troubleshooting Guide
Practical DevOps Architecture
Practical DevOps Architecture
Mar 5, 2025 · Cloud Native

Kubernetes DNS Resolution Issues and Troubleshooting Guide

This guide explains common Kubernetes DNS problems—including failure to resolve external domains, inter‑pod service discovery addresses, and related impacts on applications like Nginx reverse proxies—and provides step‑by‑step troubleshooting procedures such as checking CoreDNS, inspecting resolv.conf, and customizing dnsPolicy and dnsConfig in pod specifications.

Cloud NativeCoreDNSDNS
0 likes · 6 min read
Kubernetes DNS Resolution Issues and Troubleshooting Guide
Liangxu Linux
Liangxu Linux
Mar 2, 2025 · Operations

99 Essential Kubectl Commands for Mastering Kubernetes Diagnostics

This guide compiles 99 practical kubectl commands that cover every aspect of Kubernetes troubleshooting—from cluster and node information to pod health checks, service and deployment diagnostics, networking, storage, RBAC, scaling, and advanced debugging tools—helping operators quickly identify and resolve issues in their clusters.

CLIdiagnosticskubectl
0 likes · 18 min read
99 Essential Kubectl Commands for Mastering Kubernetes Diagnostics
Aikesheng Open Source Community
Aikesheng Open Source Community
Feb 13, 2025 · Databases

Troubleshooting OceanBase Single‑Node Replica Expansion and Log Disk Size Issues

This article details a step‑by‑step investigation of OceanBase single‑node replica expansion failures, highlighting missing sys‑tenant expansion, deprecated table replica commands, log_disk_size misconfiguration, log‑stream mechanics, and provides concrete SQL and ALTER statements to reproduce and resolve the issue.

Log ManagementOceanBaselog_disk_size
0 likes · 18 min read
Troubleshooting OceanBase Single‑Node Replica Expansion and Log Disk Size Issues
Deepin Linux
Deepin Linux
Feb 12, 2025 · Operations

Comprehensive Guide to Linux Server Fault Diagnosis and Troubleshooting

This article provides a detailed overview of common Linux server failures, a step‑by‑step methodology for fault isolation, practical monitoring tools and commands, and a real‑world case study illustrating diagnosis and remediation techniques for production environments.

LinuxSysadminmonitoring
0 likes · 26 min read
Comprehensive Guide to Linux Server Fault Diagnosis and Troubleshooting
Open Source Linux
Open Source Linux
Feb 6, 2025 · Operations

How to Quickly Diagnose and Fix 100% CPU Usage on Linux Servers

When a Linux server's CPU spikes to 100%, this guide walks you through a systematic investigation—from identifying the high‑load process and pinpointing the offending Java thread to applying a streamlined shell script—so you can resolve the issue and restore normal performance.

CPUJavaperformance
0 likes · 11 min read
How to Quickly Diagnose and Fix 100% CPU Usage on Linux Servers
Architect's Guide
Architect's Guide
Jan 9, 2025 · Backend Development

Investigation and Resolution of Random Nacos Service Deregistration in a Spring Cloud Alibaba Microservice Cluster

This article details a week‑long investigation of intermittent Nacos service deregistration in a Spring Cloud Alibaba microservice environment, describing the background architecture, multiple hypothesis tests, diagnostic commands, kernel version mismatch, and the final fix by upgrading the Linux kernel.

Backend DevelopmentLinux kernelMicroservices
0 likes · 7 min read
Investigation and Resolution of Random Nacos Service Deregistration in a Spring Cloud Alibaba Microservice Cluster
DevOps Cloud Academy
DevOps Cloud Academy
Dec 28, 2024 · Operations

Common Jenkins Issues and Their Solutions

This guide outlines frequent Jenkins problems—including master startup failures, out‑of‑memory errors, plugin incompatibilities, disk space exhaustion, configuration corruption, and Java compatibility issues—and provides step‑by‑step troubleshooting procedures to keep CI/CD pipelines running smoothly.

AutomationDevOpsJenkins
0 likes · 5 min read
Common Jenkins Issues and Their Solutions
Raymond Ops
Raymond Ops
Dec 21, 2024 · Databases

Why Does PostgreSQL Show “FATAL: password authentication failed for user ‘postgres’” and How to Fix It?

This guide explains why a PostgreSQL connection attempt fails with “FATAL: password authentication failed for user ‘postgres’”, outlines common causes such as wrong passwords and misconfigured postgresql.conf or pg_hba.conf, and provides step‑by‑step solutions including password reset, config correction, trust authentication and environment rebuild.

ConfigurationPostgreSQLpassword-authentication
0 likes · 6 min read
Why Does PostgreSQL Show “FATAL: password authentication failed for user ‘postgres’” and How to Fix It?
MaGe Linux Operations
MaGe Linux Operations
Nov 30, 2024 · Operations

Essential Linux System Monitoring and Troubleshooting Commands

This guide compiles crucial Linux commands for viewing logs, inspecting CPU, memory, disk I/O, network, system load, and performing common administrative tasks such as IP configuration, file system cleanup, and service health checks, helping sysadmins quickly diagnose and resolve issues.

OpsSysadminjournalctl
0 likes · 10 min read
Essential Linux System Monitoring and Troubleshooting Commands
Full-Stack DevOps & Kubernetes
Full-Stack DevOps & Kubernetes
Nov 28, 2024 · Operations

20 Essential Linux & Kubernetes Troubleshooting Commands Every DevOps Engineer Should Know

This guide compiles the 20 most common Linux and Kubernetes troubleshooting commands, illustrating typical outputs and step‑by‑step diagnostic reasoning for high CPU load, disk pressure, network failures, pod crashes, node issues, service outages, database errors, and application performance problems.

KubernetesLinuxSystem Administration
0 likes · 15 min read
20 Essential Linux & Kubernetes Troubleshooting Commands Every DevOps Engineer Should Know
Liangxu Linux
Liangxu Linux
Nov 16, 2024 · Backend Development

Step‑by‑Step Guide to Install and Run RabbitMQ on Linux

This tutorial walks you through installing Erlang, downloading the RabbitMQ server, starting it in detached mode, verifying its status, and troubleshooting common TCP‑listener errors on CentOS systems, providing all necessary commands and configuration details.

ErlangInstallationLinux
0 likes · 6 min read
Step‑by‑Step Guide to Install and Run RabbitMQ on Linux
Efficient Ops
Efficient Ops
Nov 10, 2024 · Operations

How to Diagnose and Fix Common Linux System Failures

This guide walks through typical Linux operational problems—including boot failures, network issues, MBR and GRUB errors, forgotten root passwords, and read‑only file‑system symptoms—explaining their causes, step‑by‑step diagnostic methods, and practical recovery commands to restore a healthy system.

Boot IssuesGRUBLinux
0 likes · 18 min read
How to Diagnose and Fix Common Linux System Failures
vivo Internet Technology
vivo Internet Technology
Oct 30, 2024 · Operations

Troubleshooting TiKV Disk Space Issues: Causes, Diagnosis, and Solutions

This guide explains how to diagnose and fix TiKV disk‑space problems by identifying oversized log files, redundant space‑placeholder files, and excessive RocksDB/Titan data, offering command‑line checks, configuration tweaks such as enabling log rotation, disabling reserve space, and tuning GC and Titan discardable‑ratio to restore balanced storage.

ConfigurationTiKVdisk space
0 likes · 16 min read
Troubleshooting TiKV Disk Space Issues: Causes, Diagnosis, and Solutions
Sanyou's Java Diary
Sanyou's Java Diary
Oct 28, 2024 · Fundamentals

Master JVM Memory Troubleshooting: A Step‑by‑Step Guide

This comprehensive guide walks you through systematic JVM memory issue diagnosis, covering initial data collection, analysis of heap, metaspace, direct memory, stack problems, and practical command‑line tools, while offering actionable tips and real‑world examples for effective troubleshooting.

DirectMemoryHeapJVM
0 likes · 56 min read
Master JVM Memory Troubleshooting: A Step‑by‑Step Guide
Efficient Ops
Efficient Ops
Oct 15, 2024 · Operations

Master 9 Essential kubectl Commands for Efficient Kubernetes Management

This guide introduces nine commonly used kubectl commands—get, create, edit, delete, apply, describe, logs, exec, and cp—explaining their purposes, providing practical examples, and offering tips to help system administrators streamline Kubernetes resource management and troubleshooting.

DevOpsKubernetescluster-management
0 likes · 10 min read
Master 9 Essential kubectl Commands for Efficient Kubernetes Management
Open Source Linux
Open Source Linux
Oct 14, 2024 · Fundamentals

Master DNS Lookups: 10 Essential nslookup Commands Explained

This guide walks you through ten practical nslookup commands for retrieving A, NS, SOA, MX, any, PTR records and more, showing how to adjust timeouts, enable debugging, and query specific DNS servers to troubleshoot and verify domain configurations.

DNScommand-linenetwork
0 likes · 5 min read
Master DNS Lookups: 10 Essential nslookup Commands Explained
Liangxu Linux
Liangxu Linux
Oct 10, 2024 · Fundamentals

Why Does a TCP Connection Reset? Understanding RST Packets Across All Stages

This article explains what TCP RST packets are, why they appear during connection establishment, data transfer, and termination, and how to analyze their causes—including server refusals, client errors, firewall policies, retransmission limits, idle timeouts, and bypass blocking—using sequence diagrams and practical diagnostics.

RSTTCPprotocol
0 likes · 11 min read
Why Does a TCP Connection Reset? Understanding RST Packets Across All Stages
Liangxu Linux
Liangxu Linux
Sep 25, 2024 · Operations

How to Install and Troubleshoot RabbitMQ on Linux (Erlang Required)

This guide walks you through installing Erlang, downloading and extracting RabbitMQ 3.0.4, starting the server in detached mode, verifying its status, and troubleshooting common port‑conflict errors on CentOS 6, with complete command‑line examples.

ErlangInstallationLinux
0 likes · 6 min read
How to Install and Troubleshoot RabbitMQ on Linux (Erlang Required)
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Sep 21, 2024 · Databases

Root Cause Analysis of a Redis Cluster Slot‑Migration Failure and Gossip‑Protocol Inconsistencies

This article analyzes a Redis cluster outage caused by a slot‑migration bug where a node simultaneously migrated slots in and out, leading to conflicting config epochs, gossip‑protocol mismatches, and MOVED errors, and provides detailed troubleshooting steps and preventive measures.

ClusterConfig EpochGossip Protocol
0 likes · 15 min read
Root Cause Analysis of a Redis Cluster Slot‑Migration Failure and Gossip‑Protocol Inconsistencies
Deepin Linux
Deepin Linux
Sep 18, 2024 · Operations

Understanding and Troubleshooting Linux Kernel Network Packet Loss

This article explains why Linux kernel network packet loss occurs, covering causes such as UDP checksum errors, firewall misconfigurations, rp_filter settings, buffer overflows, and hardware faults, and provides detailed diagnostic steps and practical solutions to identify and resolve each issue in Linux environments.

KernelLinuxPacket Loss
0 likes · 77 min read
Understanding and Troubleshooting Linux Kernel Network Packet Loss
Liangxu Linux
Liangxu Linux
Sep 8, 2024 · Operations

Diagnosing and Resolving Extreme CPU Usage in a Java Data Platform

When a data platform server suddenly shows CPU utilization near 99% despite modest traffic, this guide walks through pinpointing the offending Java process, tracing the high‑load thread, uncovering a time‑conversion routine that over‑calculates seconds, and applying a lightweight fix that drops CPU load by dozens of times.

JavaLinuxtroubleshooting
0 likes · 11 min read
Diagnosing and Resolving Extreme CPU Usage in a Java Data Platform
Linux Ops Smart Journey
Linux Ops Smart Journey
Aug 29, 2024 · Operations

How to Diagnose and Fix CoreDNS Timeout Issues in Kubernetes

This article explains why CoreDNS may experience DNS resolution timeouts in a Kubernetes cluster, how to analyze logs and timeout settings, locate upstream DNS problems, and apply practical solutions such as adjusting timeout values, switching upstream DNS servers, and deploying a local DNS service.

Cloud NativeCoreDNSDNS
0 likes · 4 min read
How to Diagnose and Fix CoreDNS Timeout Issues in Kubernetes
Zhuanzhuan Tech
Zhuanzhuan Tech
Aug 27, 2024 · Databases

MySQL InnoDB Deadlock Analysis and Resolution Guide

This article presents a detailed walkthrough of a MySQL InnoDB deadlock case, covering background, log inspection, data preparation, reproduction steps, lock analysis, root‑cause explanation, and practical solutions to prevent and resolve similar deadlock issues.

InnoDBdatabasedeadlock
0 likes · 14 min read
MySQL InnoDB Deadlock Analysis and Resolution Guide