Why Does Linux Show “watchdog: BUG: soft lockup” and How to Fix It?
Linux’s “watchdog: BUG: soft lockup – CPU#1 stuck” warning indicates a CPU soft lockup, caused by driver bugs, hardware faults, or kernel issues, and can be resolved by updating software, monitoring hardware, analyzing logs, and implementing preventive performance and hardware checks.
When a Linux system logs a message such as "watchdog: BUG: soft lockup - CPU#1 stuck for 34s! [kworker/1:3:3315742]", it means the operating system has detected a serious condition where a CPU core has not responded to the scheduler for an extended period.
1. Introduction to CPU Soft Lockup
In Linux, the watchdog mechanism monitors system health and can trigger a reboot if the system becomes unresponsive. A "soft lockup" occurs when a CPU core fails to execute other tasks for dozens of seconds, usually because it entered an infinite loop while processing a task.
2. Causes
CPU soft lockups can be triggered by several factors, including but not limited to:
Driver bugs : Faulty hardware drivers may cause the CPU to loop indefinitely during specific operations.
Hardware failures : Issues such as overheating or unstable power can slow the CPU or cause it to hang.
Kernel bugs : Bugs in the Linux kernel itself may provoke soft lockups under certain conditions.
3. Solutions
Resolving a CPU soft lockup depends on the underlying cause. Common strategies include:
Update the system and drivers : Ensure all kernel packages and device drivers are up‑to‑date to eliminate known bugs and security issues.
Monitor hardware status : Use tools such as lm-sensors to watch temperature and voltage, keeping hardware within safe operating ranges.
Analyze log files : Examine /var/log/messages and dmesg for error or warning messages that point to the source of the lockup.
4. Preventive Measures
To avoid future soft lockups, consider the following practices:
Performance monitoring : Regularly check system performance and resource usage to detect anomalies early.
Hardware testing : Perform periodic self‑tests on memory, disks, and other components to uncover latent hardware problems.
System optimization : Disable unnecessary services and processes, and fine‑tune kernel parameters to reduce CPU load.
5. Conclusion
Although a CPU soft lockup is a serious issue, systematic maintenance, timely updates, and proactive monitoring can effectively prevent it. For system administrators, understanding the root causes and recognizing the warning signs are essential for keeping Linux servers stable and reliable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ops Development & AI Practice
DevSecOps engineer sharing experiences and insights on AI, Web3, and Claude code development. Aims to help solve technical challenges, improve development efficiency, and grow through community interaction. Feel free to comment and discuss.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
