Databases 8 min read

How to Tackle Oracle Bad Blocks: Practical Strategies for DBAs

The article explains why Oracle bad block incidents demand scenario‑driven handling, outlines interview questions, describes the pitfalls of inspecting each alert log, and advocates promptly switching to disaster‑recovery and using backups to restore service, while sharing real‑world cases and practical DBA advice.

dbaplus Community

May 20, 2016

How to Tackle Oracle Bad Blocks: Practical Strategies for DBAs

Recent discussions at the Gdevops Global Agile Operations Summit emphasized that technology must be driven by business scenarios; otherwise, platform construction is meaningless. For DBAs, this means aligning maintenance techniques with real‑world operational needs.

Oracle bad block problems are common for DBAs with two or three years of experience. Interviewers often ask: Do you understand Oracle bad blocks? Why do they occur? Describe a case you handled. If multiple databases suddenly show many bad blocks, what would you do?

Many candidates answer the first questions well but falter on the last one because they focus on checking each database’s alert log, which is inefficient when dozens or hundreds of instances are involved.

The correct approach is scenario‑driven: when the number of bad blocks is large, immediately stop the affected services, switch to the disaster‑recovery (DR) environment, and restore data later. As the saying goes, “You nurture a database for years; you use it in an emergency.”

When should you trigger DR? Generally, if a fault is expected to keep the business down for more than two hours, you should declare a DR switch. In financial sectors, the threshold is much tighter—one minute for securities, half an hour for broader reporting.

Effective DR requires prior planning: build a reliable DR environment, create a switch‑over plan, and test it regularly. Rapid detection and reporting are crucial; an automated operations platform can provide a button‑click view of which objects are affected by bad blocks, greatly speeding up response.

From an interview perspective, the key is to ask why multiple databases experience bad blocks simultaneously. The root cause is often external, such as a storage management software bug. In many cases, the issue lies in the storage layer rather than the database itself.

Example 1: A bug in Storage Foundation’s volume replication caused widespread bad blocks across Oracle, IBM, and other vendors. After a month of investigation, the culprit was identified.

Example 2: After a storage software state recovery, bad blocks persisted, requiring manual fsck repairs on each system.

Another recent incident involved an unexpected termination of the LGWR process, leading to numerous datafile corruptions that prevented the database from opening. Screenshots of the error messages are shown below:

The database could not start, and internal errors appeared (see images). The fastest resolution was to restore from the latest backup combined with archived logs, rather than attempting to repair each bad block.

The lesson is to prioritize the technique that restores business continuity fastest, not the most technically sophisticated method. In this case, a solid backup strategy saved the day, even without a DR environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Oracle Database Recovery DBA Bad Blocks

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.