How to Rescue a Failed Exadata Database Using AMDU and DUL: Step‑by‑Step Guide
When an Exadata diskgroup fails to mount, this article walks through using AMDU to extract control, data, and log files, handling OMF and custom formats, advancing SCN, and finally opening the database, while also recommending preventive tools and regular maintenance.
Incident Overview
At 01:30 a critical Exadata X2 failure prevented the ASM diskgroup from mounting, reported cell disk status “unknown”, ASM disk headers were invalid and several physical disks were damaged, affecting ~10 TB of data.
Pre‑Recovery Recommendations
Ensure recent backups and a Data Guard configuration before attempting recovery, especially on aging Exadata hardware.
Recovery Strategy
Three‑phase approach: (1) extract control, data and log files from the damaged diskgroup using AMDU, (2) attempt to mount and open the database, (3) if the open fails, use DUL or related utilities for deeper recovery.
Extracting Files with AMDU
AMDU (ASM Metadata Dump Utility) works on Oracle 11g and later without compilation.
Locate startup parameters (including control_file) in alert.log and create a temporary pfile (e.g., /tmp/pfile).
Identify the control file path from the pfile and run: amdu -diskstring '/o/*/*' -extract data.266 This creates DATA_266.f (the control file) and report.txt.
Mount the database using the extracted control file.
Query the mounted instance to list data and redo log files:
select name from v$datafile;</code><code>select member from v$logfile;Data files appear in two naming conventions:
OMF format – automatically generated numeric suffixes, e.g. +DATA/exdb/datafile/system.256.278946847955.
Custom format – user‑defined names, e.g. +DATA/exdb/datafile/users_2013084.dbf.
Extract OMF files with a command similar to: amdu -diskstring '/o/*/*' -extract data.256 For custom‑named files, dump the metadata of the corresponding diskgroup (e.g., DATA.6) and locate the file number: amdu -extract DATA.6 -diskstring 'o/*/DATA' After obtaining DATA_6.f, read block information to map file numbers to names, for example:
for i in $(seq 1 14); do
kfed read DATA_6.f blknum=$i | egrep 'name|fnum' >> aa.out
doneRepeat the extraction for all discovered data files. A 3 TB bigfile required ~24 hours to extract over a slow NFS mount.
Validate the extracted files with dbv to detect physical bad blocks.
Opening the Database
Attempt to open the mounted database. Common errors include ORA‑1555 and ORA‑704. Required actions:
Gather initialization parameters from the pfile or alert log.
Recover missing rollback segments. Options: (a) search the system file with strings, or (b) use DUL/AUL/ODU/GDUL to dump SYS_UNDO.dmp, import it into a temporary user, and remove entries with status 1 or 2.
If ORA‑1555 persists, advance the SCN. Locate the SCN address with oradebug poke, add an offset (e.g., 1,000,000 decimal → hex), then repeatedly execute: alter database open upgrade; Optional low‑level fixes: modify corrupted blocks with bbed and delete stale dictionary records.
After applying these steps the database opened and data was recovered.
Additional Recovery Utilities
Exadata maintenance tools that can be run periodically (e.g., bi‑weekly) to detect hardware or firmware issues include:
sundiag
ExaWatcher
Diskinfo, IBCardino, Iostat, Netstat, Ps, Top, Vmstat
Exachk
CheckHWnFWProfile
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
