Tagged articles
59 articles
Page 1 of 1
Java Architect Handbook
Java Architect Handbook
Mar 22, 2026 · Backend Development

Hidden 70 ms Delay in Spring Boot: Tomcat Embed Bug Triggered by Swagger UI

After optimizing a Spring Boot channel service, a mysterious extra ~100 ms appeared per request; detailed tracing with Arthas revealed that embedded Tomcat repeatedly loads Swagger UI META‑INF resources, causing a 70 ms overhead, which can be eliminated by upgrading Tomcat or removing the Swagger dependencies.

ArthasEmbedded Tomcat BugPerformance debugging
0 likes · 30 min read
Hidden 70 ms Delay in Spring Boot: Tomcat Embed Bug Triggered by Swagger UI
MaGe Linux Operations
MaGe Linux Operations
Mar 10, 2026 · Artificial Intelligence

Why Your LLM Service Hits CUDA OOM and How to Diagnose GPU Memory Issues

This guide explains the five common sources of GPU memory consumption in large‑model inference services, provides a step‑by‑step diagnosis workflow—from static usage and KV‑Cache analysis to concurrency and K8s scheduling—offers concrete command‑line checks, scripts, configuration examples, and actionable remediation and monitoring recommendations.

GPU MemoryKV cacheLLM OOM
0 likes · 28 min read
Why Your LLM Service Hits CUDA OOM and How to Diagnose GPU Memory Issues
Code Ape Tech Column
Code Ape Tech Column
Feb 25, 2026 · Backend Development

Why a Spring Boot API Took 100ms Extra: Tracing Tomcat’s Hidden Jar Loading Bug

A Spring Boot channel service showed an unexpected 100 ms latency; by systematically checking network, using curl, and employing Arthas to trace Spring MVC and Tomcat internals, the author discovered a Tomcat‑embed bug that repeatedly loads Swagger‑UI JAR resources, which is resolved by upgrading Tomcat.

ArthasEmbedded TomcatPerformance debugging
0 likes · 15 min read
Why a Spring Boot API Took 100ms Extra: Tracing Tomcat’s Hidden Jar Loading Bug
dbaplus Community
dbaplus Community
Jan 4, 2026 · Cloud Native

Why One in a Million Searches Slowed 100× After Moving to Kubernetes

During Pinterest’s migration of its custom search platform Manas to the PinCompute Kubernetes environment, a rare latency spike—one request per million taking 100 times longer—was traced to cAdvisor’s memory‑intensive smaps scans, revealing hidden resource contention and prompting a targeted fix.

KubernetesMemory ManagementPerformance debugging
0 likes · 13 min read
Why One in a Million Searches Slowed 100× After Moving to Kubernetes
Java Companion
Java Companion
Nov 30, 2025 · Backend Development

Unlock Powerful Java Performance Analysis with IntelliJ IDEA and JProfiler

This guide explains why Java developers need profiling, introduces IntelliJ IDEA’s built‑in Profiler (powered by Async Profiler and JFR), and provides step‑by‑step instructions with screenshots for CPU, memory, and thread analysis to diagnose slow endpoints, high CPU usage, memory leaks, and concurrency bottlenecks.

CPU analysisIntelliJ IDEAJProfiler
0 likes · 12 min read
Unlock Powerful Java Performance Analysis with IntelliJ IDEA and JProfiler
Java Tech Enthusiast
Java Tech Enthusiast
Nov 29, 2025 · Operations

Why Did One Pod Trigger 61 Young GCs and a Full GC? A Step‑by‑Step Diagnosis

A developer encountered a sudden CPU spike caused by excessive JVM garbage collection in a single Kubernetes pod, and by using Linux monitoring tools, thread‑ID conversion, jstack analysis, and file transfer techniques pinpointed a flawed Excel export implementation that created massive in‑memory lists, ultimately fixing the issue.

JVMKubernetesLinux
0 likes · 6 min read
Why Did One Pod Trigger 61 Young GCs and a Full GC? A Step‑by‑Step Diagnosis
NiuNiu MaTe
NiuNiu MaTe
Nov 26, 2025 · Fundamentals

How to Diagnose and Fix 100% CPU Overload with Smart Scheduling

This guide explains how CPU scheduling works, why 100% CPU usage occurs, and provides a step‑by‑step troubleshooting workflow—including monitoring with top/vmstat, identifying offending threads, analyzing stack traces, and applying both quick‑fix and long‑term remediation techniques—to keep systems stable.

CPU schedulingLinuxPerformance debugging
0 likes · 19 min read
How to Diagnose and Fix 100% CPU Overload with Smart Scheduling
DevOps Coach
DevOps Coach
Sep 20, 2025 · Cloud Native

Why a Tiny Memory‑Intensive Process Caused 100× Latency Spikes After Pinterest’s Search Migration to Kubernetes

During Pinterest’s migration of its high‑traffic Manas search platform to the PinCompute Kubernetes environment, engineers observed an extremely rare latency outlier—one in a million requests took 100 times longer—prompting a deep investigation that traced the root cause to cAdvisor’s memory‑intensive smaps scans interfering with leaf node processing.

Cloud NativeKubernetesMemory Management
0 likes · 14 min read
Why a Tiny Memory‑Intensive Process Caused 100× Latency Spikes After Pinterest’s Search Migration to Kubernetes
JD Tech Talk
JD Tech Talk
Jun 10, 2025 · Backend Development

Instantly Spot Problematic SQL with MyBatis Interceptor Coloring

This article explains how to use SQL coloring in MyBatis by implementing a lightweight interceptor or an AspectJ weave to annotate each SELECT statement with its mapper ID and execution stack, enabling rapid identification of performance bottlenecks during high‑traffic events.

Database MonitoringMyBatisPerformance debugging
0 likes · 29 min read
Instantly Spot Problematic SQL with MyBatis Interceptor Coloring
Thoughts on Knowledge and Action
Thoughts on Knowledge and Action
Apr 13, 2025 · Operations

How to Diagnose Java OOM Crashes with Eclipse MAT: Step‑by‑Step Guide

When a production Java service repeatedly restarts and triggers full GC and OutOfMemoryError alerts, this guide shows how to capture heap dumps using JVM flags, install and configure Eclipse Memory Analyzer (MAT), and systematically analyze the dump to pinpoint memory leaks, high usage, and problematic code.

Heap DumpMemory AnalyzerOutOfMemoryError
0 likes · 6 min read
How to Diagnose Java OOM Crashes with Eclipse MAT: Step‑by‑Step Guide
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 7, 2025 · Artificial Intelligence

Why Does GPU Memory Keep Growing in DeepSeek‑R1 Inference? Uncovering PyTorch’s Cache

After deploying the full‑precision DeepSeek‑R1 model on a 2×8‑GPU ACS cluster, repeated stress tests showed GPU memory usage continuously rising without release; this article details the investigation, reproduces the behavior, examines vLLM logs, Prometheus metrics, and reveals PyTorch’s caching allocator as the root cause, offering mitigation tips.

DeepSeekGPU MemoryMemory Cache
0 likes · 21 min read
Why Does GPU Memory Keep Growing in DeepSeek‑R1 Inference? Uncovering PyTorch’s Cache
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 9, 2024 · Operations

Introducing Kyanos: A Lightweight eBPF‑Based Tool for Fast Network Issue Diagnosis

Kyanos is an open‑source command‑line utility that leverages eBPF to provide low‑overhead, kernel‑compatible network tracing and performance analysis for HTTP, MySQL, and Redis traffic, offering simple watch and stat commands that replace slow tcpdump workflows with seconds‑level diagnostics.

Performance debuggingcommand-line tooleBPF
0 likes · 11 min read
Introducing Kyanos: A Lightweight eBPF‑Based Tool for Fast Network Issue Diagnosis
Code Ape Tech Column
Code Ape Tech Column
Jun 21, 2024 · Backend Development

Debugging a 100 ms Latency Bug in a Spring Boot Channel System Using Arthas

This article documents the step‑by‑step investigation of an unexpected ~100 ms response delay in a Spring Boot channel service, showing how network checks, curl tests, Arthas trace/stack commands, and source analysis revealed a Tomcat‑embed bug caused by Swagger‑ui resources and how upgrading Tomcat resolved the issue.

ArthasBackendPerformance debugging
0 likes · 14 min read
Debugging a 100 ms Latency Bug in a Spring Boot Channel System Using Arthas
vivo Internet Technology
vivo Internet Technology
Apr 24, 2024 · Big Data

Analysis and Resolution of a FileSystem‑Induced Memory Leak Causing OOM in Production

The article details how repeatedly calling FileSystem.get(uri, conf, user) created distinct UserGroupInformation objects, inflating the static FileSystem cache and causing a heap‑memory leak that triggered an Out‑Of‑Memory error, and explains that using the two‑argument get method or explicitly closing instances resolves the issue.

HadoopOutOfMemoryPerformance debugging
0 likes · 13 min read
Analysis and Resolution of a FileSystem‑Induced Memory Leak Causing OOM in Production
Architecture Digest
Architecture Digest
Dec 4, 2023 · Operations

Using Arthas to Diagnose High CPU Usage in a Java Application

This tutorial demonstrates how to employ the open‑source Java diagnostic tool Arthas to quickly locate and analyze a high‑CPU problem in a running JVM by leveraging commands such as dashboard, thread, jad, watch, and ognl, complete with code examples and step‑by‑step instructions.

ArthasCPU profilingPerformance debugging
0 likes · 7 min read
Using Arthas to Diagnose High CPU Usage in a Java Application
Java Backend Technology
Java Backend Technology
Oct 8, 2023 · Operations

How I Traced a Sudden CPU Spike to JVM GC Issues in a Container

After receiving an alarm that a production container’s CPU usage surged past 90%, I investigated the JVM metrics, discovered excessive young and full GCs in a single pod, and walked through the detailed troubleshooting steps—including top, thread analysis, jstack, and code fixes—that resolved the issue.

CPU SpikeJVMKubernetes
0 likes · 7 min read
How I Traced a Sudden CPU Spike to JVM GC Issues in a Container
Java Architecture Diary
Java Architecture Diary
Jun 20, 2023 · Backend Development

Unlock Java Performance: How to Use IntelliJ IDEA’s Built‑In Profiler

This guide walks you through using IntelliJ IDEA Ultimate’s built‑in Profiler to analyze Java CPU and memory performance, covering quick start steps, visualizations such as hotspot maps, call trees, method lists, timelines, real‑time charts, and exporting results as .jfr and .hprof files.

CPU analysisIntelliJ IDEAJava profiling
0 likes · 5 min read
Unlock Java Performance: How to Use IntelliJ IDEA’s Built‑In Profiler
Bilibili Tech
Bilibili Tech
Jun 2, 2023 · Backend Development

Investigation and Resolution of Service Availability Fluctuations in a High‑QPS Go Backend Service

An investigation of a 100k‑QPS Go monolith revealed that intermittent availability drops were caused by a memory‑leak in the third‑party gcache LFU implementation, which inflated GC work and produced long mark phases; upgrading gcache eliminated the leak and restored 0.999+ availability, highlighting the need for thorough observability and dependency monitoring.

Garbage CollectionGoPerformance debugging
0 likes · 10 min read
Investigation and Resolution of Service Availability Fluctuations in a High‑QPS Go Backend Service
WeiLi Technology Team
WeiLi Technology Team
Mar 29, 2023 · Databases

Why Did MongoDB’s Query Planner Suddenly Slow Down? A Deep Dive into Index Cache Failures

The article explains how null or empty query values caused MongoDB to ignore the intended index, leading to massive slow queries and service timeouts, and details the step‑by‑step investigation, plan‑cache invalidation, and the corrective addition of a compound index that restored normal performance.

MongoDBPerformance debuggingPlan Cache
0 likes · 21 min read
Why Did MongoDB’s Query Planner Suddenly Slow Down? A Deep Dive into Index Cache Failures
Java Architect Essentials
Java Architect Essentials
Mar 27, 2023 · Backend Development

Diagnosing and Solving a 100 ms Latency Issue in Spring Boot's Embedded Tomcat Using Arthas

This article walks through the step‑by‑step investigation of an unexpected ~100 ms latency in a Spring Boot channel service, using network checks, curl timing, and the Arthas Java diagnostic tool to pinpoint a Tomcat‑embed bug caused by Swagger jars, and then shows how upgrading Tomcat or Spring Boot resolves the problem.

ArthasPerformance debuggingSpring Boot
0 likes · 14 min read
Diagnosing and Solving a 100 ms Latency Issue in Spring Boot's Embedded Tomcat Using Arthas
Tencent Cloud Developer
Tencent Cloud Developer
Dec 22, 2022 · Databases

Dynamic‑Tracing Based Memory‑Leak (Growth) Analysis for MySQL‑Proxy in TDSQL

Using lightweight dynamic‑tracing tools that record allocator calls and page‑fault events, the authors diagnose a production MySQL‑proxy memory leak in TDSQL, generate focused flame‑graphs with custom memstacks and pgfaultstacks, and demonstrate a fast, source‑independent alternative to gdb or Valgrind.

LinuxMySQL-ProxyPerformance debugging
0 likes · 13 min read
Dynamic‑Tracing Based Memory‑Leak (Growth) Analysis for MySQL‑Proxy in TDSQL
ELab Team
ELab Team
Aug 10, 2022 · Operations

How We Solved a Massive Memory Leak in a VSCode Extension Using llnode and heapdump

After releasing a new version of a VSCode extension, intermittent freezes were traced to a memory leak; the investigation used llnode and heapdump, tackled Electron version mismatches, extended DevTools parsing limits, and ultimately identified recursive socket callbacks as the root cause.

DevToolsPerformance debuggingVSCode
0 likes · 15 min read
How We Solved a Massive Memory Leak in a VSCode Extension Using llnode and heapdump
Sohu Tech Products
Sohu Tech Products
Jul 20, 2022 · Backend Development

Diagnosing Thread Blocking in a Spring Boot Service Caused by Logback Configuration Errors

This article details a step‑by‑step investigation of a Java Spring‑Boot service that suffered nightly response‑time alerts, revealing that misconfigured Logback file paths caused cross‑volume log rotation, thread blocking, and ultimately a production outage, and shows how gray‑deployment and environment fixes resolved the issue.

KubernetesPerformance debuggingSpring Boot
0 likes · 13 min read
Diagnosing Thread Blocking in a Spring Boot Service Caused by Logback Configuration Errors
Code Ape Tech Column
Code Ape Tech Column
Mar 23, 2022 · Operations

Using Arthas to Diagnose High CPU Usage in Java Applications

This tutorial demonstrates how to download, attach, and use the Arthas Java diagnostic tool—leveraging commands like dashboard, thread, jad, watch, and ognl—to quickly locate and fix high CPU problems caused by parallel stream code in a Java application.

ArthasCPU profilingPerformance debugging
0 likes · 7 min read
Using Arthas to Diagnose High CPU Usage in Java Applications
Open Source Linux
Open Source Linux
Jan 23, 2022 · Operations

Mastering IT Trouble‑Shooting: Proven Strategies to Diagnose and Resolve Complex System Failures

This article shares practical methods and real‑world case studies for IT professionals to analyze, locate, and fix system runtime issues, service timeouts, file‑handle leaks, JVM memory overflows, and performance bottlenecks, emphasizing hypothesis testing, boundary narrowing, and systematic post‑mortems.

IT OperationsJVM MemoryPerformance debugging
0 likes · 31 min read
Mastering IT Trouble‑Shooting: Proven Strategies to Diagnose and Resolve Complex System Failures
Architect's Tech Stack
Architect's Tech Stack
Feb 20, 2021 · Backend Development

Root Cause Analysis of High Native Memory Usage in a Spring Boot Application

After migrating a project to the MDP framework based on Spring Boot, the system repeatedly reported excessive swap usage; the investigation revealed that native memory allocated by the Spring Boot classloader’s Reflections scanning and InflaterInputStream caused 700 MB–800 MB of off‑heap memory to remain unreleased, which was eventually resolved by limiting the scan path and updating Spring Boot.

Native MemoryPerformance debuggingSpring Boot
0 likes · 12 min read
Root Cause Analysis of High Native Memory Usage in a Spring Boot Application
Code Ape Tech Column
Code Ape Tech Column
Feb 5, 2021 · Backend Development

Diagnosing and Solving a 100 ms Delay in Spring Boot Embedded Tomcat Using Arthas

This article walks through the step‑by‑step investigation of an unexpected ~100 ms latency in a Spring Boot channel service, using network checks, curl timing, Arthas trace and watch commands to pinpoint a Tomcat‑embed bug caused by Swagger‑UI JAR loading, and then shows how upgrading Tomcat resolves the issue.

ArthasPerformance debuggingSpring Boot
0 likes · 30 min read
Diagnosing and Solving a 100 ms Delay in Spring Boot Embedded Tomcat Using Arthas
dbaplus Community
dbaplus Community
Feb 1, 2021 · Operations

How to Build a Low‑Cost Distributed Tracing System for Microservices

This article explains the evolution from a monolithic architecture to microservices, outlines the new pain points such as fault isolation, performance bottlenecks and scaling inefficiencies, and presents a practical, low‑cost distributed tracing solution with unified frameworks, components, configuration management, data collection, and visualization.

Configuration ManagementDistributed TracingPerformance debugging
0 likes · 31 min read
How to Build a Low‑Cost Distributed Tracing System for Microservices
Java Captain
Java Captain
Dec 12, 2020 · Backend Development

Diagnosing and Resolving a 100 ms Latency Issue in a Spring Boot Channel System Using Arthas

This article details the step‑by‑step investigation of an unexpected ~100 ms response delay in a Spring Boot‑based channel system, showing how network checks, curl measurements, and deep tracing with the Arthas Java diagnostic tool pinpointed a Tomcat‑embed bug caused by Swagger‑UI resources and how upgrading Tomcat resolved the problem.

ArthasPerformance debuggingSpring Boot
0 likes · 30 min read
Diagnosing and Resolving a 100 ms Latency Issue in a Spring Boot Channel System Using Arthas
Architect
Architect
Aug 30, 2020 · Backend Development

Root Cause Analysis of Excessive Swap Memory Usage in a Spring Boot Application

The article details a step‑by‑step investigation of abnormal swap memory consumption in a Spring Boot project, revealing that native memory allocated by the Inflater during JAR scanning was not released promptly, leading to apparent memory leaks that were ultimately resolved by configuring package scanning and updating Spring Boot.

Native MemoryPerformance debuggingSpring Boot
0 likes · 12 min read
Root Cause Analysis of Excessive Swap Memory Usage in a Spring Boot Application
Architecture Digest
Architecture Digest
Jul 5, 2020 · Backend Development

Diagnosing Excessive Off‑Heap Memory Usage in a Spring Boot Application

The article details a step‑by‑step investigation of why a Spring Boot service migrated to the MDP framework consumed far more physical memory than its 4 GB heap, revealing native‑code allocations, memory‑pool behavior of glibc and tcmalloc, and how limiting MCC scan paths or upgrading Spring Boot resolves the off‑heap leak.

Native MemoryOff-Heap MemoryPerformance debugging
0 likes · 11 min read
Diagnosing Excessive Off‑Heap Memory Usage in a Spring Boot Application
Youzan Coder
Youzan Coder
May 28, 2020 · Backend Development

Diagnosing High CPU Usage in Java Applications with Arthas

Using the open‑source Arthas tool, the author traced a Java server’s 99 % CPU usage to two runaway threads, inspected their stack traces, discovered a cyclic bucket in a HashBiMap caused by unsynchronized cache updates, and resolved the issue by adding a synchronized keyword to the cache‑sync method.

ArthasCPU profilingPerformance debugging
0 likes · 10 min read
Diagnosing High CPU Usage in Java Applications with Arthas
Architect's Tech Stack
Architect's Tech Stack
Mar 31, 2020 · Backend Development

Investigation of Excessive Native Memory Usage in a Spring Boot Application

This article details a step‑by‑step investigation of unusually high native memory consumption in a Spring Boot service, covering JVM configuration, system‑level diagnostics with jcmd, pmap, gperftools, strace, GDB, and jstack, and explains how the MCC component’s default package scanning caused the leak and how configuring scan paths or upgrading Spring Boot resolved the issue.

JVMLinux toolsNative Memory
0 likes · 11 min read
Investigation of Excessive Native Memory Usage in a Spring Boot Application
Meituan Technology Team
Meituan Technology Team
Jan 3, 2019 · Backend Development

Investigation of Excessive Native Memory Usage After Migrating to Spring Boot

After moving to Spring Boot, the application consumed up to 7 GB of native memory because Meituan’s MCC package scanner invoked Spring’s ZipInflaterInputStream, which allocated large off‑heap buffers during JAR decompression that were only freed by the JVM finalizer and retained by glibc’s 64 MB arenas; restricting the scan scope or upgrading to Spring Boot 2.0.5 eliminated the excess usage.

JVMNative MemoryPerformance debugging
0 likes · 13 min read
Investigation of Excessive Native Memory Usage After Migrating to Spring Boot
Java Backend Technology
Java Backend Technology
Oct 24, 2018 · Backend Development

Why My Java App Hits 100% CPU: Live Infinite Loop Demo & Diagnosis

This article walks through setting up a Vagrant‑based experiment that injects an intentional infinite loop into a simple Spring MVC service, then demonstrates step‑by‑step how to identify the offending process using top, examine JVM heap with jstat, and trace the problematic thread with jstack to resolve a CPU‑100% issue.

CPUJVMPerformance debugging
0 likes · 4 min read
Why My Java App Hits 100% CPU: Live Infinite Loop Demo & Diagnosis
Java Captain
Java Captain
May 24, 2018 · Big Data

Debugging a Kafka Data Drop: A Step‑by‑Step Troubleshooting Case Study

After a recent feature release caused a sharp decline in a key data metric, the team followed a systematic, fourteen‑step troubleshooting process—including verification, code review, DBA involvement, local debugging, environment comparison, logging, packet capture, service restarts, request mode changes, load testing, and partition resizing—to identify and resolve a Kafka‑related throughput bottleneck.

KafkaLoad TestingPerformance debugging
0 likes · 8 min read
Debugging a Kafka Data Drop: A Step‑by‑Step Troubleshooting Case Study
Qunar Tech Salon
Qunar Tech Salon
Oct 26, 2015 · Operations

Diagnosing High CPU Usage in PHP Processes with strace

This article demonstrates how to use strace, including its -c, -T, and -e options, to identify kernel‑level system calls such as clone that cause high CPU consumption in PHP processes on a Linux server, providing step‑by‑step commands and interpretation of the results.

LinuxPHPPerformance debugging
0 likes · 4 min read
Diagnosing High CPU Usage in PHP Processes with strace