Tagged articles
577 articles
Page 4 of 6
Open Source Linux
Open Source Linux
Nov 3, 2022 · Fundamentals

How to Set Up Port Forwarding for Remote Access: A Step-by-Step Guide

This guide explains what port forwarding (port mapping) is, why it’s useful for remote access, and provides detailed instructions on configuring it on a router, using public IP or dynamic DNS, plus common troubleshooting tips to ensure external users can reach internal services.

NATRemote accessRouter configuration
0 likes · 7 min read
How to Set Up Port Forwarding for Remote Access: A Step-by-Step Guide
ShiZhen AI
ShiZhen AI
Oct 25, 2022 · Operations

How to Diagnose Unexpected Errors When Adding a New Kafka Consumer Group

When starting a new Kafka consumer group, an unexpected SyncGroup error occurs due to a RecordTooLargeException, and the article walks through log inspection, identifies the oversized __consumer_offsets record, and resolves the issue by increasing the message.max.bytes configuration.

KafkaRecordTooLargeExceptionSyncGroup
0 likes · 5 min read
How to Diagnose Unexpected Errors When Adding a New Kafka Consumer Group
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Oct 24, 2022 · Operations

PAM Authentication Troubleshooting: Real-World Linux Server Failure Cases and Solutions

Real‑world Linux server failures show that missing PAM support in SSH prevents ulimit changes, misordered pam_faillock entries break cron authentication, and custom pam_script setups for Squid require careful configuration, highlighting that module order, thorough testing, and proper hardening are essential for reliable PAM authentication.

Linux authenticationlinuxpam
0 likes · 11 min read
PAM Authentication Troubleshooting: Real-World Linux Server Failure Cases and Solutions
Aikesheng Open Source Community
Aikesheng Open Source Community
Oct 19, 2022 · Operations

Analyzing OceanBase Error Logs to Locate Error Causes

This article explains the types and formats of OceanBase logs, how to identify relevant log files, interpret log fields such as trace_id, lt, and dc, and provides step-by-step methods for using error codes and trace IDs to pinpoint the root cause of errors.

DatabaseOperationsErrorLogOceanBase
0 likes · 16 min read
Analyzing OceanBase Error Logs to Locate Error Causes
Cloud Native Technology Community
Cloud Native Technology Community
Oct 17, 2022 · Cloud Native

A Three‑Step Approach to Understanding, Managing, and Preventing Kubernetes Failures

This article presents a practical three‑step methodology—understanding, managing, and preventing—to troubleshoot Kubernetes deployments, explains how to leverage monitoring, observability, and incident‑response tools, and offers guidance on fostering team collaboration and building resilient, self‑healing cloud‑native systems.

Cloud NativeKubernetesObservability
0 likes · 7 min read
A Three‑Step Approach to Understanding, Managing, and Preventing Kubernetes Failures
Open Source Linux
Open Source Linux
Oct 14, 2022 · Cloud Native

Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures

On September 10 2021, a Kubernetes cluster experienced intermittent kubectl hangs caused by kube-apiserver OOM kills, leading to cascading control-plane failures; this article details the environment, observed metrics, log analysis, code inspection of DeleteCollection, and provides troubleshooting steps to prevent similar incidents.

OOMcloud-nativeetcd
0 likes · 21 min read
Why Did Our kube-apiserver OOM? A Deep Dive into Kubernetes Control‑Plane Failures
转转QA
转转QA
Sep 28, 2022 · Backend Development

Remote Debugging Guide for Sandbox and Test Environments

This article explains how to set up remote debugging for services in sandbox or test environments, covering remote creation, locating debug ports, constructing JVM arguments, checking service status, stopping services, and useful alias commands to streamline the debugging process.

AliasBackendService Management
0 likes · 5 min read
Remote Debugging Guide for Sandbox and Test Environments
Efficient Ops
Efficient Ops
Sep 15, 2022 · Operations

How to Diagnose and Fix Linux Network Latency Issues

This article explains how to identify the root causes of increased network latency on Linux servers, covering tools such as ping, traceroute, hping3, and wrk, demonstrating packet analysis with tcpdump and Wireshark, and showing how TCP delayed ACK and socket options affect response times.

Network LatencyTCPWireshark
0 likes · 17 min read
How to Diagnose and Fix Linux Network Latency Issues
Top Architect
Top Architect
Sep 3, 2022 · Cloud Native

Docker Troubleshooting Guide: Common Issues and Solutions

This article provides a comprehensive guide to diagnosing and fixing a wide range of Docker problems, including storage migration, disk space shortages, missing libraries, container corruption, network configuration, permission errors, image management, and timeout issues, with detailed command-line solutions and configuration examples.

ContainerDevOpsNetworking
0 likes · 34 min read
Docker Troubleshooting Guide: Common Issues and Solutions
Liangxu Linux
Liangxu Linux
Aug 6, 2022 · Operations

When Core Switches Suddenly Die: The Hidden SSD Time‑Bomb in Network Gear

A network engineer recounts a terrifying outage caused by a firmware‑related SSD bug that locks core switches after 28,224 hours of use, explains the emergency troubleshooting steps taken, and highlights the need for better vendor recall mechanisms to protect critical infrastructure.

Hardware ReliabilityOperationsSSD bug
0 likes · 8 min read
When Core Switches Suddenly Die: The Hidden SSD Time‑Bomb in Network Gear
Aikesheng Open Source Community
Aikesheng Open Source Community
Jul 28, 2022 · Databases

Understanding Dble Startup Configuration Validation and Common Failure Cases (Version 3.22.01.0)

This article explains how Dble validates its configuration files during startup, lists the main configuration file types, and walks through typical startup failure examples—such as port conflicts, syntax errors, and backend MySQL connectivity issues—providing step‑by‑step troubleshooting guidance for new users.

ConfigurationDatabase Middlewaremysql
0 likes · 9 min read
Understanding Dble Startup Configuration Validation and Common Failure Cases (Version 3.22.01.0)
ByteDance Web Infra
ByteDance Web Infra
Jul 8, 2022 · Fundamentals

Git Troubleshooting, Root‑Cause Analysis, and Community Contribution Guide

This article walks through a real Git‑induced failure, explains how to reproduce and diagnose the nested process loop, analyzes the underlying commit‑graph and alternates interactions, and then details step‑by‑step how to seek help and contribute a fix to the Git open‑source community.

Version Controlcommunity contributionopen source
0 likes · 14 min read
Git Troubleshooting, Root‑Cause Analysis, and Community Contribution Guide
Aikesheng Open Source Community
Aikesheng Open Source Community
Jun 30, 2022 · Databases

Handling Replication Anomalies in MySQL Slave IO Thread

This article analyzes MySQL replication anomalies caused by master failures or network interruptions that lead to incomplete transaction replay on slaves, demonstrates a reproducible experiment using network delay and iptables, and provides practical guidance for both recovering and permanently handling stalled slave IO threads.

GTIDReplicationdatabase
0 likes · 6 min read
Handling Replication Anomalies in MySQL Slave IO Thread
IT Services Circle
IT Services Circle
Jun 21, 2022 · Databases

MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived

This article recounts a MySQL high‑availability incident in a dual‑master environment, explains how missing binary‑log index files caused replication failures, and details step‑by‑step troubleshooting, directory recreation, binlog position correction, and configuration improvements to restore reliable database operation.

Replicationdatabaseshigh availability
0 likes · 8 min read
MySQL High‑Availability Incident Review and Resolution in a Dual‑Master Setup with Keepalived
dbaplus Community
dbaplus Community
Jun 5, 2022 · Databases

Detecting and Resolving Redis Performance Bottlenecks

This guide explains how to identify when Redis is slow, measure baseline latency, monitor slow commands and latency, troubleshoot network, fork, huge pages, swap, AOF, expiration, and big keys, and provides a practical checklist of solutions.

databaseredistroubleshooting
0 likes · 18 min read
Detecting and Resolving Redis Performance Bottlenecks
Aikesheng Open Source Community
Aikesheng Open Source Community
May 24, 2022 · Databases

Read SQL Routed to Master Despite rwSplitMode=3 Configuration

The article investigates why read queries are sent to the master node when Dble is configured with rwSplitMode=3, analyzes reproductions on standard replication and MySQL Group Replication, identifies heartbeat and replication status issues, and provides troubleshooting steps and recommendations.

Database MiddlewareGroup Replicationmysql
0 likes · 7 min read
Read SQL Routed to Master Despite rwSplitMode=3 Configuration
vivo Internet Technology
vivo Internet Technology
May 18, 2022 · Backend Development

Kafka Cluster Fault Analysis: Root Cause and Cascading Failure Mechanism

A Kafka cluster at vivo suffered a total traffic drop across a resource group when a broker’s disk failed, because the default producer partitioner still hashed keys to the failed partition, exhausting client buffers and blocking all healthy partitions, prompting recommendations to avoid keys or use custom partitioners.

Distributed SystemsKafkaPerformance Optimization
0 likes · 9 min read
Kafka Cluster Fault Analysis: Root Cause and Cascading Failure Mechanism
Open Source Linux
Open Source Linux
Apr 22, 2022 · Backend Development

Master Nginx Access & Error Log Configuration: A Complete Guide

This guide explains how to configure and read Nginx access and error logs, covering log formats, file locations, per‑server settings, log level options, and practical commands for parsing log entries to aid troubleshooting and performance monitoring.

Access Logerror loglog configuration
0 likes · 9 min read
Master Nginx Access & Error Log Configuration: A Complete Guide
Aikesheng Open Source Community
Aikesheng Open Source Community
Apr 21, 2022 · Databases

Investigation of MySQL 5.7 Opening‑Table State Caused by Federated Engine Queries

The article analyzes why a MySQL 5.7.32 instance’s monitoring thread remains in the Opening table state when querying information_schema.tables, discovers the involvement of the Federated storage engine, reproduces the issue with a non‑existent remote server, and confirms the root cause through controlled experiments.

Federated EngineOpening Tablemysql
0 likes · 4 min read
Investigation of MySQL 5.7 Opening‑Table State Caused by Federated Engine Queries
Java Captain
Java Captain
Apr 19, 2022 · Databases

Eight Classic MySQL Errors and How to Fix Them

This article presents eight common MySQL error scenarios—including forgotten passwords, password policy violations, case‑sensitivity issues, service startup failures, export/import restrictions, connection limits, binary log overflow, and primary‑key replication errors—along with detailed troubleshooting steps and configuration commands to resolve each problem.

ConfigurationDatabase ErrorsSQL
0 likes · 13 min read
Eight Classic MySQL Errors and How to Fix Them
IT Services Circle
IT Services Circle
Apr 17, 2022 · Operations

Troubleshooting DNS Latency After Machine Replacement in a Go Service

The article details a step‑by‑step investigation of why HTTP request latency increased after moving a Go‑based service to new hardware, focusing on DNS resolution delays, the role of DNSmasq, Go's resolver implementation, and the experiments that led to fixing the issue.

GoMachine Replacementlinux
0 likes · 14 min read
Troubleshooting DNS Latency After Machine Replacement in a Go Service
Xiaolei Talks DB
Xiaolei Talks DB
Apr 16, 2022 · Databases

Why TiKV Scale‑In Stuck After Expansion? Diagnosis and Fix

This guide explains why a TiKV node remains pending offline after a scale‑out and scale‑in operation, walks through detailed log inspection, region checks, and command‑line troubleshooting, and provides a step‑by‑step solution to forcefully remove the problematic region and clean up the store.

ClusterTiDBTiKV
0 likes · 13 min read
Why TiKV Scale‑In Stuck After Expansion? Diagnosis and Fix
Liangxu Linux
Liangxu Linux
Apr 13, 2022 · Operations

24 Docker Troubleshooting Hacks: Fix Storage, Network, and Startup Issues

This guide compiles twenty‑four common Docker problems—from oversized storage directories and disk‑space shortages to network misconfigurations, NFS lock errors, and container startup failures—providing clear explanations, step‑by‑step commands, and configuration tweaks to resolve each issue efficiently.

ContainersDevOpsDocker
0 likes · 38 min read
24 Docker Troubleshooting Hacks: Fix Storage, Network, and Startup Issues
IT Services Circle
IT Services Circle
Apr 12, 2022 · Backend Development

Resolving .NET Runtime and SDK Installation Issues on Windows

This guide explains why enabling .NET Framework may still leave applications unable to run, details the need to install both .NET Runtime and SDK for versions 5.0 and 6.0 in x86 and x64, and provides step‑by‑step troubleshooting with a real‑world example.

InstallationRuntimeSDK
0 likes · 6 min read
Resolving .NET Runtime and SDK Installation Issues on Windows
IT Services Circle
IT Services Circle
Apr 11, 2022 · Fundamentals

Troubleshooting and Repairing a Faulty USB Flash Drive

This article narrates a step‑by‑step troubleshooting process for a malfunctioning 128 GB USB flash drive, covering initial failure, diagnostics with Windows Disk Management, chip detection tools, firmware re‑flashing using mptools, speed testing with DiskMark, and final verification, offering practical tips for similar hardware repairs.

Flash DriveHardware RepairPerformance Test
0 likes · 5 min read
Troubleshooting and Repairing a Faulty USB Flash Drive
Open Source Linux
Open Source Linux
Mar 31, 2022 · Operations

Mastering tcpdump: Practical Commands for Network Packet Capture

This guide explains how to use tcpdump for network packet capture, covering basic usage, interface selection, host and port filtering, logical operators, saving captures to files, and real‑world troubleshooting scenarios with clear command examples.

Network MonitoringPacket Capturecommand-line
0 likes · 7 min read
Mastering tcpdump: Practical Commands for Network Packet Capture
FunTester
FunTester
Mar 22, 2022 · Backend Development

Navigating Jira API Inconsistencies: Six Ways to Pass Issue Status Parameters

The article examines the chaotic changes in Jira's newer API version, detailing six incompatible methods of passing issue status parameters, explains the mismatch between name and id fields, and provides Java helper functions to handle these inconsistencies effectively.

APIBackendIntegration
0 likes · 5 min read
Navigating Jira API Inconsistencies: Six Ways to Pass Issue Status Parameters
dbaplus Community
dbaplus Community
Mar 14, 2022 · Cloud Native

19 Common Kubernetes Failures and How to Fix Them

This guide walks through nineteen typical Kubernetes problems—from service access and port‑mapping errors to pod init failures, PVC issues, and helm installation glitches—explaining each root cause, providing concise troubleshooting steps, and showing the exact kubectl commands and code snippets needed to resolve them.

Cloud NativeContainersKubernetes
0 likes · 10 min read
19 Common Kubernetes Failures and How to Fix Them
Open Source Linux
Open Source Linux
Mar 8, 2022 · Operations

Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs

This article breaks down Kubernetes troubleshooting into three essential steps—understanding the failure, managing the response, and preventing recurrence—while mapping key monitoring, observability, and incident‑response tools to each phase for reliable cloud‑native operations.

KubernetesObservabilityOperations
0 likes · 8 min read
Master Kubernetes Troubleshooting: The Three Pillars Every Engineer Needs
Code Ape Tech Column
Code Ape Tech Column
Mar 7, 2022 · Operations

Using JDK Built‑in Tools to Monitor and Diagnose the JVM

This article demonstrates how to use the JDK’s native command‑line and graphical utilities—such as jps, jinfo, jvisualvm, jconsole, jstat, jstack and jcmd—to observe JVM metrics, troubleshoot memory and thread issues, and verify JVM parameters in Java applications.

JDKJVMJava
0 likes · 15 min read
Using JDK Built‑in Tools to Monitor and Diagnose the JVM
Alibaba Cloud Native
Alibaba Cloud Native
Feb 28, 2022 · Cloud Native

How to Observe and Diagnose DNS Failures in Kubernetes Clusters

This article explains how DNS operates inside Kubernetes, enumerates common failure causes, describes CoreDNS's built‑in observability plugins, introduces BPF‑based client‑side diagnostics, and provides a step‑by‑step troubleshooting workflow to identify and resolve DNS issues in cloud‑native environments.

BPFCoreDNSDNS
0 likes · 18 min read
How to Observe and Diagnose DNS Failures in Kubernetes Clusters
Su San Talks Tech
Su San Talks Tech
Feb 28, 2022 · Databases

How to Detect and Fix Redis Performance Issues: A Complete Guide

This article explains why Redis latency spikes can cause system-wide outages, shows how to measure baseline performance, monitor slow commands, and address network, RDB, swap, AOF, expiration, and big‑key problems with practical solutions and a diagnostic checklist.

SlowlogSwapredis
0 likes · 18 min read
How to Detect and Fix Redis Performance Issues: A Complete Guide
Refining Core Development Skills
Refining Core Development Skills
Feb 25, 2022 · Databases

Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues

This article provides a step‑by‑step methodology for identifying why a Redis instance becomes slow, covering baseline latency testing, slow‑log analysis, big‑key detection, expiration patterns, memory limits, fork overhead, huge‑page effects, AOF configuration, CPU binding, swap usage, memory fragmentation, network saturation, and practical remediation techniques.

Backendoptimizationperformance
0 likes · 40 min read
Comprehensive Guide to Diagnosing and Optimizing Redis Performance Issues
IT Services Circle
IT Services Circle
Feb 24, 2022 · Databases

Diagnosing and Solving Redis Performance Issues

This article explains how to detect Redis latency problems, measure baseline performance, monitor slow commands, and address common causes such as network round‑trip delays, fork‑generated RDB snapshots, transparent huge pages, swap usage, AOF settings, key expiration, and big‑key handling, providing practical troubleshooting steps and solutions.

Latencydatabasemonitoring
0 likes · 20 min read
Diagnosing and Solving Redis Performance Issues
Java High-Performance Architecture
Java High-Performance Architecture
Jan 25, 2022 · Cloud Native

Why Is Debugging Microservices on Kubernetes So Hard? Proven Strategies to Overcome It

Debugging microservices in a Kubernetes environment is challenging due to the abstraction of pods, network complexities, infrastructure issues, and application-level faults, but by monitoring at the service layer, aggregating data, and applying machine‑learning‑based anomaly detection, teams can effectively identify and resolve problems.

KubernetesMicroservicesmachine learning
0 likes · 6 min read
Why Is Debugging Microservices on Kubernetes So Hard? Proven Strategies to Overcome It
Open Source Linux
Open Source Linux
Jan 23, 2022 · Operations

Mastering IT Trouble‑Shooting: Proven Strategies to Diagnose and Resolve Complex System Failures

This article shares practical methods and real‑world case studies for IT professionals to analyze, locate, and fix system runtime issues, service timeouts, file‑handle leaks, JVM memory overflows, and performance bottlenecks, emphasizing hypothesis testing, boundary narrowing, and systematic post‑mortems.

IT OperationsJVM MemoryPerformance debugging
0 likes · 31 min read
Mastering IT Trouble‑Shooting: Proven Strategies to Diagnose and Resolve Complex System Failures
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 15, 2021 · Operations

Why Does Linux Load Spike? Deep Dive into Load Average Calculation & Troubleshooting

During high‑traffic events like Double‑11, Linux systems often see load averages surge, affecting response times and command execution; this article explains what load averages represent, how the kernel computes them using exponential weighted moving averages, and outlines common causes and systematic methods for root‑cause analysis.

Load Averagekernelperformance
0 likes · 12 min read
Why Does Linux Load Spike? Deep Dive into Load Average Calculation & Troubleshooting
Cloud Native Technology Community
Cloud Native Technology Community
Nov 25, 2021 · Databases

Why Is My Redis Slowing Down? A Complete Troubleshooting Guide

This article provides a systematic, step‑by‑step methodology for diagnosing Redis latency spikes, covering baseline performance testing, slow‑log analysis, high‑complexity commands, big‑key handling, expiration patterns, memory limits, fork overhead, huge‑page settings, AOF configurations, CPU binding, swap usage, memory fragmentation, network saturation, and practical monitoring tips.

Latencydatabasemonitoring
0 likes · 42 min read
Why Is My Redis Slowing Down? A Complete Troubleshooting Guide
ByteDance Web Infra
ByteDance Web Infra
Oct 13, 2021 · Operations

DNS Resolution Failure for goofy.app in Singapore Office Caused by DNSSEC Misconfiguration

An internal investigation revealed that the goofy.app domain could not be resolved from Singapore offices because a misconfigured DNSSEC DS record caused validation failures, while Chinese DNS resolvers ignored DNSSEC, leading to successful resolution there; removing the erroneous DS key restored global accessibility.

DNSDNSSECDomain Resolution
0 likes · 10 min read
DNS Resolution Failure for goofy.app in Singapore Office Caused by DNSSEC Misconfiguration
Ops Development Stories
Ops Development Stories
Sep 28, 2021 · Cloud Native

Why Does PLEG ‘Not Healthy’ Make a Kubernetes Node NotReady?

This article explains the role of the Pod Lifecycle Event Generator (PLEG) in Kubelet, why the “PLEG is not healthy” error causes nodes to become NotReady, common failure scenarios, and a step‑by‑step troubleshooting method that ultimately resolves the issue by upgrading systemd.

KubernetesNodeNotReadyPLEG
0 likes · 11 min read
Why Does PLEG ‘Not Healthy’ Make a Kubernetes Node NotReady?
Efficient Ops
Efficient Ops
Sep 13, 2021 · Operations

Mastering tcpdump: Essential Commands for Network Traffic Analysis

This guide explains how to use tcpdump for capturing and filtering network packets, covering basic and advanced command options, logical filters, saving captures, and a real‑world troubleshooting scenario involving a Node.js server behind Nginx.

Packet Capturenetwork analysistcpdump
0 likes · 7 min read
Mastering tcpdump: Essential Commands for Network Traffic Analysis
Liangxu Linux
Liangxu Linux
Sep 7, 2021 · Cloud Native

Top 19 Kubernetes Service Issues and How to Fix Them

This guide compiles nineteen common Kubernetes problems—from certificate errors and service exposure failures to pod initialization issues and Helm installation errors—providing concise root‑cause analyses and step‑by‑step command solutions to help you quickly troubleshoot and resolve cluster disruptions.

ClusterKubernetesPod
0 likes · 11 min read
Top 19 Kubernetes Service Issues and How to Fix Them
DeWu Technology
DeWu Technology
Sep 3, 2021 · Operations

Live Streaming Service Monitoring and Alert Attribution Practice

The article outlines a systematic approach for quickly attributing live‑streaming service alerts—combining consolidated knowledge, log and trace analysis, and a decision‑tree workflow—to pinpoint root causes such as resource limits or mesh overload, illustrated by a real RT‑jitter case and emphasizing deep architectural understanding.

alert attributionmonitoringtroubleshooting
0 likes · 8 min read
Live Streaming Service Monitoring and Alert Attribution Practice
dbaplus Community
dbaplus Community
Aug 30, 2021 · Operations

How to Systematically Diagnose High RSS Memory Usage in Java Services

This article presents a step‑by‑step methodology for troubleshooting high RSS memory consumption in Java applications, covering heap size assessment, ARENA region analysis, native memory tracking, off‑heap memory checks, and automation tools to streamline the entire diagnostic process.

JVMJavaMemory
0 likes · 15 min read
How to Systematically Diagnose High RSS Memory Usage in Java Services
Aikesheng Open Source Community
Aikesheng Open Source Community
Aug 24, 2021 · Databases

Troubleshooting MySQL Group Replication Transaction Certification Errors and Recovery

An in‑depth analysis of MySQL Group Replication (MGR) transaction certification failures, covering error symptoms, root‑cause investigation, replication‑group transaction set mismatches, and step‑by‑step recovery procedures with code examples and best‑practice recommendations to keep MGR clusters clean.

Group ReplicationTransaction Certificationdatabase
0 likes · 19 min read
Troubleshooting MySQL Group Replication Transaction Certification Errors and Recovery
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2021 · Big Data

Common Kafka Errors and Their Solutions

This article compiles a comprehensive list of frequent Kafka errors—including UnknownTopicOrPartitionException, LEADER_NOT_AVAILABLE, TimeoutException, and configuration issues—explaining their causes, providing detailed analysis, and offering step‑by‑step troubleshooting commands and configuration adjustments to resolve each problem.

ConfigurationError Handlingtroubleshooting
0 likes · 33 min read
Common Kafka Errors and Their Solutions
Efficient Ops
Efficient Ops
Aug 2, 2021 · Cloud Native

19 Common Kubernetes Failures and How to Fix Them

This guide walks through nineteen typical Kubernetes problems—from service access failures and pod initialization errors to Helm installation issues—explaining root causes, providing concise solutions, and including command‑line examples and screenshots to help operators quickly resolve cluster disruptions.

cloud-nativetroubleshooting
0 likes · 10 min read
19 Common Kubernetes Failures and How to Fix Them
Efficient Ops
Efficient Ops
Jul 28, 2021 · Operations

Master Network Troubleshooting with MTR: Install, Use, and Analyze Results

Learn how to install and use the powerful MTR network diagnostic tool across Windows, Linux, macOS, and Android, understand its combined ping/traceroute output, master key command-line options, and interpret loss, latency, and stability metrics to effectively troubleshoot connectivity issues.

Network DiagnosticsWindowslinux
0 likes · 10 min read
Master Network Troubleshooting with MTR: Install, Use, and Analyze Results
Java Interview Crash Guide
Java Interview Crash Guide
Jun 26, 2021 · Backend Development

Essential Linux and Java Tools for Fast Troubleshooting and Performance Tuning

This guide compiles a comprehensive set of Linux commands and Java diagnostic utilities—including tail, grep, awk, find, tsar, btrace, Greys, Arthas, and JProfiler—offering practical examples and code snippets to help developers quickly identify and resolve performance and stability issues in production environments.

Javamonitoringtools
0 likes · 16 min read
Essential Linux and Java Tools for Fast Troubleshooting and Performance Tuning
vivo Internet Technology
vivo Internet Technology
Jun 16, 2021 · Backend Development

Troubleshooting Dubbo Thread Pool Exhaustion: A Redis Performance Optimization Case Study

The case study details how a high‑traffic Dubbo service handling 1.8 billion daily requests suffered periodic circuit‑breaks due to thread‑pool exhaustion, traced to a cache‑bypass bug, Redis setex spikes, and an improperly warmed commons‑pool2 connection pool, and resolved by fixing the bug, scaling Redis, and tuning or downgrading the pool configuration to enable pre‑warming via minEvictableIdleTimeMillis.

Circuit BreakingConnection PoolDubbo
0 likes · 13 min read
Troubleshooting Dubbo Thread Pool Exhaustion: A Redis Performance Optimization Case Study
Liangxu Linux
Liangxu Linux
Apr 29, 2021 · Fundamentals

10 Proven Debugging Techniques Every Programmer Should Master

Debugging is inevitable for programmers, and this guide presents ten practical strategies—from mindset adjustments and bug reproduction to log analysis, online research, code commenting, breakpoint usage, and even the quirky rubber duck method—to help developers efficiently locate and resolve bugs across any codebase.

Debuggingbest practicesprogramming tips
0 likes · 9 min read
10 Proven Debugging Techniques Every Programmer Should Master
Efficient Ops
Efficient Ops
Apr 27, 2021 · Operations

Diagnosing Common Java Server Issues: CPU, Memory, Disk & Network

This guide walks through systematic troubleshooting of Java server problems—including CPU spikes, memory leaks, disk I/O bottlenecks, and network timeouts—by using native Linux tools and JVM utilities such as ps, top, jstack, jstat, iostat, vmstat, and netstat to pinpoint root causes and apply targeted fixes.

CPUJavaMemory
0 likes · 22 min read
Diagnosing Common Java Server Issues: CPU, Memory, Disk & Network
Open Source Linux
Open Source Linux
Apr 26, 2021 · Databases

How to Resolve MySQL Too Many Connections Errors by Raising max_connections

This guide explains why MySQL reports "too many connections", shows how to check the current max_connections setting, compares default limits across MySQL versions, and provides four practical methods—including editing my.cnf, using SQL commands, modifying source code, and tweaking mysqld_safe—to increase the connection limit safely.

Configurationdatabasemax_connections
0 likes · 4 min read
How to Resolve MySQL Too Many Connections Errors by Raising max_connections
Efficient Ops
Efficient Ops
Apr 12, 2021 · Operations

Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers

This guide explains the fundamentals of network packet loss, illustrates how packets are sent and received, and provides step‑by‑step troubleshooting methods for hardware NIC, driver, kernel stack, TCP/UDP, and application‑level issues on Linux systems, complete with command examples and visual diagrams.

Packet Lossethtoolnetwork
0 likes · 34 min read
Mastering Network Packet Loss: Diagnosis and Solutions for Linux Servers
ITPUB
ITPUB
Apr 7, 2021 · Operations

8 Real-World Production Failures and How to Diagnose Them Quickly

The article shares eight authentic production incident cases—from frequent JVM Full GC and memory leaks to cache avalanches, DNS hijacking, and database deadlocks—detailing their root causes, diagnostic steps, code snippets, and practical remediation strategies for engineers facing similar challenges.

CacheJVMOperations
0 likes · 17 min read
8 Real-World Production Failures and How to Diagnose Them Quickly
Wukong Talks Architecture
Wukong Talks Architecture
Apr 3, 2021 · Backend Development

Two Years of Kafka in a Restaurant Order System: Problems, Solutions, and Lessons Learned

This article recounts the author's two‑year experience with Kafka in a high‑traffic restaurant ordering system, detailing why message ordering matters, the pitfalls of synchronous retries, message backlog, partition routing, primary‑key conflicts, database replication lag, and practical mitigation strategies for reliable backend processing.

Kafkadistributed-systemsmessage-queue
0 likes · 17 min read
Two Years of Kafka in a Restaurant Order System: Problems, Solutions, and Lessons Learned
Liangxu Linux
Liangxu Linux
Apr 3, 2021 · Cloud Native

Top 17 Docker Troubleshooting Tips: From Storage Migration to Network Errors

This guide compiles seventeen common Docker problems—including storage directory migration, disk space shortages, missing libraries, container corruption, network misconfigurations, and command‑line quirks—along with step‑by‑step solutions, configuration tweaks, and command examples to help engineers quickly diagnose and resolve container issues.

ConfigurationContainerDocker
0 likes · 26 min read
Top 17 Docker Troubleshooting Tips: From Storage Migration to Network Errors