How to Diagnose and Fix the 8 Most Common Production Issues
This article outlines practical troubleshooting steps for eight frequent production problems—including OOM, CPU spikes, interface timeouts, index failures, deadlocks, disk issues, MQ backlogs, and API errors—providing clear guidance and code snippets to help engineers quickly identify root causes and resolve them.
1 OOM Issues
OOM problems in production are serious and can cause services to crash. Different types have different causes.
1.1 Java Heap OOM
Typical log entry: java.lang.OutOfMemoryError: Java heap space Add JVM options to dump heap on OOM:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=heapdump.hprofAnalyze the dump with MAT or VisualVM to locate the offending code.
1.2 Native Thread OOM
Log entry:
java.lang.OutOfMemoryError: unable to create native threadUsually caused by too many threads or oversized thread stacks. Reduce thread count by using a thread pool.
1.3 StackOverflowError
Log entry: java.lang.StackOverflowError Often caused by deep or infinite recursion. Check recursive calls for correctness.
1.4 GC Overhead Limit Exceeded
Log entry:
java.lang.OutOfMemoryError: GC overhead limit exceededOccurs when too many objects are created during GC. Adjust GC strategy and survivor/new generation ratios.
1.5 Metaspace OOM
Log entry: java.lang.OutOfMemoryError: Metaspace After JDK 8, Metaspace replaces PermGen. It can overflow when many classes are loaded. Increase Metaspace size:
-XX:MetaspaceSize=10m -XX:MaxMetaspaceSize=10m2 CPU 100% Issues
CPU saturation often results from long‑running tasks or inefficient code.
Common causes are illustrated below:
Use jstack or Alibaba's Arthas to inspect thread activity.
3 Interface Timeout Issues
Sudden API timeouts can stem from many factors. Common reasons are shown in the diagram:
4 Index Invalid Issues
Indexes may become ineffective, causing slow queries. Use EXPLAIN to view execution plans.
Typical causes are illustrated below:
5 Deadlock Issues
MySQL deadlocks occur when multiple transactions compete for the same resources.
Resource contention
Circular wait
Poor transaction design
Concurrent operation conflicts
Improper index usage
Mitigation steps:
Set appropriate transaction isolation levels.
Avoid large transactions.
Optimize SQL performance.
Configure lock wait timeouts.
Increase monitoring and analysis.
6 Disk Issues
Disk failures or insufficient space are common. Check usage with: df -Hl Monitor total size, used space, and available space. Clean /tmp and old logs to free space.
7 MQ Message Backlog
Backlogs happen when consumers process messages slower than producers generate them.
Possible causes:
Producers batch‑send messages.
Consumers suffer from slow business logic (e.g., MySQL index issues).
Solutions include increasing consumer thread pool size or scaling out consumer instances, and optimizing indexing and SQL.
8 API Error Codes
Various HTTP status codes indicate different problems:
8.1 401 Unauthorized
Occurs when authentication credentials are missing or invalid.
8.2 403 Forbidden
Authentication succeeded but the user lacks permission.
8.3 404 Not Found
Requested endpoint does not exist, often due to version changes or gateway misconfiguration.
8.4 405 Method Not Allowed
Wrong HTTP method used (e.g., GET instead of POST).
8.5 500 Internal Server Error
Server-side exception; check error logs and print request parameters for debugging.
8.6 502 Bad Gateway
Service unavailable, often due to restart or crash; restarting the service usually helps.
8.7 504 Gateway Timeout
Gateway timed out waiting for the upstream service; optimize the service code to reduce latency.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
