How to Diagnose and Report MySQL/Oracle Issues Within 10 Minutes
This guide walks through a rapid 10‑minute troubleshooting workflow for MySQL and Oracle databases, covering resource checks, log inspection, wait‑event analysis, SQL identification, and feedback best practices to quickly pinpoint performance problems without relying on external monitoring tools.
Our team follows a strict escalation policy: any database incident—whether a business‑impacting error, a hang, or a crash—must be reported to the direct supervisor immediately and followed up within ten minutes, making rapid diagnosis essential, especially when tools like OSWatcher are unavailable.
Step 1: Check System Resource Usage
Inspect CPU idle, I/O wait, and queue lengths. In the example, the waiting queue rose from the usual ~60 to nearly 200, CPU idle dropped by 10 percentage points, and IOWait increased by a few points.
Step 2: Examine Logs for Errors
On a single‑instance system, start with the alert log; if errors appear, review the corresponding trace logs. The screenshots show no error messages but reveal occasional "cannot allocate new log" entries, suggesting a possible need to enlarge log groups or files.
Step 3: Analyze Wait Events
The wait‑event screen lists several items. A lock wait is within normal range, there are four scattered reads, about 100 sequential reads (slightly high), and over 800 "read by other session" events, which is unusually large.
Further investigation shows that the 800+ reads stem from two SQL statements, each querying the same table.
SQL‑ID Deep Dive
Running awrsqrpt.sql on the identified SQL IDs shows a very high execution frequency—over a million executions per hour—while each execution consumes normal I/O for a table with nearly a billion rows. Statistics are fresh (collected a day ago) and differ from actual row counts by less than 1 %.
Thus the database itself is healthy; the slowdown originates from a business process that triggers the query excessively, raising CPU usage by about 10 percentage points and causing each query to run for over three minutes.
Key Takeaways and Extended Knowledge
Read by other session is derived from buffer busy wait . It occurs when one session loads data into cache while another session requests the same blocks before they are fully cached, often accompanied by sequential or scattered reads.
Mitigation includes reducing abnormal connection bursts from the application side or, with proper authorization, killing offending sessions.
For single‑instance databases, also check disk I/O (e.g., iostat, sar -d) and network health ( netstat, top/topas), as well as OS logs.
In Oracle RAC environments, examine each instance, CRS logs, and ASM parameters, which are often set conservatively.
When SQL itself may be at fault, verify that execution plans have not changed and assess optimization opportunities; maintaining a solid baseline and regular statistics updates greatly speeds up root‑cause analysis.
Finally, the primary role of a frontline engineer is rapid feedback, not immediate resolution. Document findings, hand off to specialists if needed, and close the loop by adding the case to a knowledge base for future reference.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
