Tagged articles
8 articles
Page 1 of 1
MaGe Linux Operations
MaGe Linux Operations
Sep 24, 2025 · Operations

How I Pinpointed the Real Culprit of a 100% CPU Spike in Production in Just 3 Minutes

When a production server hit 100% CPU at 3 AM, the author walks through a three‑minute, step‑by‑step method—quickly identifying the offending process, drilling into threads, and pinpointing problematic code—while sharing useful shell commands, common pitfalls, advanced safeguards like cgroup limits and eBPF tracing.

CPU troubleshootingLinux performanceOperations
0 likes · 9 min read
How I Pinpointed the Real Culprit of a 100% CPU Spike in Production in Just 3 Minutes
Efficient Ops
Efficient Ops
Jul 14, 2025 · Operations

Rescuing a Critical CPU Outage: My Step-by-Step Troubleshooting Guide

After a midnight CPU alarm threatened service stability, I walked through rapid diagnosis with top and htop, identified JVM bottlenecks using jstat and async‑profiler, refactored a Java sorting algorithm, added caching, optimized database queries, containerized the service, and set up Prometheus‑Grafana alerts to prevent future incidents.

CPU troubleshootingDockerJava performance
0 likes · 7 min read
Rescuing a Critical CPU Outage: My Step-by-Step Troubleshooting Guide
Senior Tony
Senior Tony
May 29, 2025 · Operations

How to Diagnose and Fix 100% CPU on Database and Application Servers

This guide explains how to identify the root causes of a server's CPU hitting 100%—whether on a database or an application server—by using cloud monitoring, Linux top commands, thread analysis with jstack, and practical Java code fixes such as limiting loops, optimizing locks, and handling GC pressure.

CPU troubleshootingDatabase MonitoringJava
0 likes · 9 min read
How to Diagnose and Fix 100% CPU on Database and Application Servers
Efficient Ops
Efficient Ops
Jan 19, 2025 · Operations

How I Rescued a Critical Service from 100% CPU: A Step‑by‑Step Ops Playbook

After a midnight CPU alarm, I walked through rapid diagnosis, JVM profiling, algorithm refactoring, database indexing, Docker isolation, and enhanced monitoring to bring a high‑load Java service back to stability, illustrating a comprehensive incident‑response workflow for modern operations teams.

CPU troubleshootingDocker deploymentJVM profiling
0 likes · 7 min read
How I Rescued a Critical Service from 100% CPU: A Step‑by‑Step Ops Playbook
MaGe Linux Operations
MaGe Linux Operations
Jan 16, 2020 · Operations

How to Quickly Diagnose and Fix High CPU Usage in a Data Platform

This guide walks through a real‑world incident where a data platform’s CPU spiked to 98.94%, showing step‑by‑step how to identify the high‑load process, pinpoint the offending Java thread, analyze the root cause in the time‑utility code, and implement a performance‑focused solution that reduced load by thirtyfold.

CPU troubleshootingJava profilingLinux monitoring
0 likes · 7 min read
How to Quickly Diagnose and Fix High CPU Usage in a Data Platform
MaGe Linux Operations
MaGe Linux Operations
Dec 24, 2018 · Operations

How to Quickly Diagnose and Fix High CPU Usage on a Data Platform Server

This guide walks through a step‑by‑step investigation of a sudden 98% CPU spike on a data‑platform server, showing how to pinpoint the offending process, trace the problematic Java thread, analyze the root cause in a time‑utility method, and apply an optimized solution that reduces CPU load by thirtyfold.

Backend DevelopmentCPU troubleshootingJava
0 likes · 7 min read
How to Quickly Diagnose and Fix High CPU Usage on a Data Platform Server