Recovering Oracle
Since an Oracle database consists of several interrelated parts, recovering such a database is done through a process of elimination. Identify which pieces work, then recover the pieces that don't work. The following recovery guidefollows that logic and works regardless of the chosen backup method. It consists of a flowchart and a procedure whose numbered steps correspond to the elements in the flowchart.
Step 1: Try Startup Mount
The first step in verifying the condition of an Oracle database is to attempt to mount it. This works because mounting a database (without opening it) reads the control files but does not open the data files. If the control files are mirrored, Oracle attempts to open each of the control files that are listed in the initORACLE_SID.ora file. If any of them is damaged, the mount fails.
To mount a database, simply run svrmgrl, connect to the database, and enter startup mount.
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > startup mount;
Statement processed.
If it succeeds, the output looks something like this:
SVRMGR > startup mount;
ORACLE instance started.
Total System Global Area 5130648 bytes
Fixed Size 44924 bytes
Variable Size 4151836 bytes
Database Buffers 409600 bytes
Redo Buffers 524288 bytes
Database mounted.
If the attempt to mount the database succeeds, proceed to Step 10.
If the attempt to mount the database fails, the output looks something like this:
SVRMGR > startup mount;
Total System Global Area 5130648 bytes
Fixed Size 44924 bytes
Variable Size 4151836 bytes
Database Buffer to s 409600 bytes
Redo Buffers 524288 bytes
ORACLE instance started.
ORA-00205: error in identifying controlfile, check alert log for more info
If the attempt to mount the database fails, proceed to Step 2.
Step 2: Are All Control Files Missing?
Don't panic if the attempt to mount the database fails. Control files are easily restored if they were mirrored, and can even be rebuilt from scratch if necessary. The first important piece of information is that one or more control files are missing.
Unfortunately, since Oracle aborts the mount at the first failure it encounters, it could be missing one, two, or all of the control files, but so far you know only about the first missing file. So, before embarking on a course of action, determine the severity of the problem. In order to do that, do a little research.
First, determine the names of all of the control files. Do that by looking at the configORACLE_SID.ora file next to the word control files. It looks something like this:
control_files = (/db/Oracle/a/oradata/crash/control01.ctl,
/db/Oracle/b/oradata/crash/control02.ctl,
/db/Oracle/c/oradata/crash/control03.ctl)
It's also important to get the name of the control file that Oracle is complaining about. Find this by looking for the phrase control file: in the alert log. (The alert log can be found in the location specified by the background_dump_dest value in the configinstance.ora file. (Typically, it is in the ORACLE_BASE/ORACLE_SID/admin/bdumpdirectory.) In that directory, there should be a file called alert_ORACLE_SID.log. In that file, there should be an error that looks something like this:
Sat Feb 21 13:46:19 1998
alter database mount exclusive
Sat Feb 21 13:46:20 1998
ORA-00202: controlfile: '/db/a/oradata/crash/control01.ctl'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Warning! Some of the following procedures may say to override a potentially corrupted control file. Since one never knows which file may be needed, always make backup copies of all of the control files before doing any of this. That offers an "undo" option that isn't possible otherwise. (Also make copies of the online redo logs as well.)
With the names of all of the control files and the name of the damaged file, it's easy to determine the severity of the problem. Do this by listing each of the control files and comparing their size and modification time. (Remember the game "Which one of these is not like the other," on Sesame Street?) The following scenarios assume that the control files were mirrored to three locations, which is a very common practice. The possible scenarios are:
The damaged file is missing, and at least one other file is present
If the file that Oracle is complaining about is just missing, that's an easy thing to fix.If this is the case, proceed to Step 3.
The damaged file is not missing. It is corrupted
This is probably the most confusing one, since it's hard to tell if a file is corrupted. What to do in this situation is a personal choice. Before going any farther, make backup copies of all control files. Once you do that, try a "shell game" with the different control files. The shell game consists of taking one of the three control files and copying it to the other two files' locations. Then attempt to mount the database again. The "shell game" is covered in Step 3.However, if all the online redo logs are present, it's probably easier at this point to just run the "create controlfile" script discussed in Steps 6 and 7. This rebuilds the control file to all locations automatically. (Before that, though, follow Steps 4 and 5 to verify if all the data files and log files are present.)To rebuild the control file using the "create controlfile" script, proceed to Steps 4 through 7.
All of the control files are missing, or they are all different sizes and/or times.
If all of the control files are corrupt or missing, they must be rebuilt or the entire database must be restored. Hopefully your backup system has been running the backup control file to tracecommand on a regular basis. (The output of this command is a SQL script that will rebuild the control files automatically.)If the backup control file to trace command has been running, proceed to Steps 4 through 7. If not, then proceed to Step 8.
Step 3: Replace Missing Control File
If the file that Oracle is complaining about is either missing or appears to have a different date and time than the other control files, this will be easy. Simply copy another one of the mirrored copies of the control file to the damaged control file's name and location. (The details of this procedure are below.) Once this is done, just attempt to mount the database again.
Warning! Make sure to make backup copies of all of the control files before overwriting them!
The first thing to do is to get the name of the damaged control file. Again, this is relatively easy. Look in the alert log for a section like the one below:
Sat Feb 21 13:46:19 1998
alter database mount exclusive
Sat Feb 21 13:46:20 1998
ORA-00202: controlfile: '/db/a/oradata/crash/control01.ctl'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Always make backups of all the control files before copying any of them on top of each other. The next step would be to copy a known good control file to the damaged control file's location.
Once that is done, return to Step 1 and try the startup mount again.
"But I don't have a good control file!"
It's possible that there may be no known good control file, which is what would happen if the remaining control files have different dates and/or sizes. If this is the case, it's probably best to use the "create controlfile" script.
To use the create controlfile script, proceed to Steps 4 through 7.
If that's not possible or probable, try the following procedure. First, make backups of all of the control files. Then, one at a time, try copying every version of each control file to all the other locations -- excluding the one that Oracle has already complained about, since it's obviously damaged.
Each time a new control file is copied to multiple locations, return to Step 1.
For example, assume there are three control files: /a/control1.ctl, /b/control2.ctl, and /c/control3.ctl. The alert log says that the /c/control3.ctl is damaged, and since /a/control1.ctl and /b/control2.ctl have different modification times, there's no way to know which one is good. Try the following steps:
First, make backup copies of all the files:
$ cp /a/control1.ctl /a/control1.ctl.sav
$ cp /b/control2.ctl /b/control2.ctl.sav
$ cp /c/control3.ctl /c/control3.ctl.sav
Second, try copying one file to all locations. Skip control3.ctl, since it's obviously damaged. Try starting with control1.ctl:
$ cp /a/control1.ctl /b/control2.ctl
$ cp /a/control1.ctl /c/control3.ctl
Now attempt a startup mount:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > startup mount
Sat Feb 21 15:43:21 1998
alter database mount exclusive
Sat Feb 21 15:43:22 1998
ORA-00202: controlfile: '/a/control3.ctl'
ORA-27037: unable to obtain file status
This error says that the file that was copied to all locations is also damaged. Now try the second file, control2.ctl:
$ cp /b/control2.ctl /a/control1.ctl
$ cp /b/control2.ctl /a/control3.ctl
Now attempt to do a startup mount:
SVRMGR > startup mount;
ORACLE instance started.
Total System Global Area 5130648 bytes
Fixed Size 44924 bytes
Variable Size 4151836 bytes
Database Buffers 409600 bytes
Redo Buffers 524288 bytes
Database mounted.
It appears that control2.ctl was a good copy of the control file.
Once the attempt to mount the database is successful, proceed to Step 10.
Step 4: Are All Data Files and Redo Logs OK?
Steps 4 and 5 are required only prior to performing Step 6.
The "create controlfile" script described in Step 7 works only if all the data files and online redo logs are in place. The data files can be older versions that were restored from backup, since they will be rolled forward by the media recovery. However, the online redo logs must be current and intact for the "create controlfile" script to work.
The reason that this is the case is that the rebuild process looks at each data file as it is rebuilding the control file. Each data file contains a System Change Number (SCN) that corresponds to a certain online redo log. If a data file shows that it has an SCN that is more recent than the online redo logs that are available, the control file rebuild process will abort.
If it's likely that one or more of the data files or online redo logs is damaged, go to Step 5. If it's more likely that they are all intact, go to Step 6.
Step 5: Recover Damaged Data Files or Redo Logs
If one or more of the data files or online redo logs are definitely damaged, follow all the instructions below to see if there are any other damaged files. (A little extra effort now will save a lot of frustration later.) If it's possible that all the data files and online redo logs are okay, another option would be to skip this step and try to recreate the control file now. (An unsuccessful attempt at this will not cause any harm.) If it fails, return to this step. If there is plenty of time, go ahead and perform this step first.
To try and recreate the control files now, proceed to Step 6.
The first thing to find out is where all of the data files and redo logs are. To determine this, run the following command on the mounted, closed database:
SVRMGR > connect internal;
Connected.
SVRMGR > select name from v$datafile;
(Example output below)
SVRMGR > select group#, member from v$logfile;
(Example output below)
Figure B contains sample output from these commands:
SVRMGR > select name from v$datafile;
NAME
--------------------------------------------------------------------------------
/db/Oracle/a/oradata/crash/system01.dbf
/db/Oracle/a/oradata/crash/rbs01.dbf
/db/Oracle/a/oradata/crash/temp01.dbf
/db/Oracle/a/oradata/crash/tools01.dbf
/db/Oracle/a/oradata/crash/users01.dbf
/db/Oracle/a/oradata/crash/test01.dbf
6 rows selected.
SVRMGR > select group#, member from v$logfile;
MEMBER
--------------------------------------------------------------------------------
1 /db/Oracle/a/oradata/crash/redocrash01.log
3 /db/Oracle/c/oradata/crash/redocrash03.log
2 /db/Oracle/b/oradata/crash/redocrash02.log
1 /db/Oracle/b/oradata/crash/redocrash01.log
2 /db/Oracle/a/oradata/crash/redocrash03.log
3 /db/Oracle/c/oradata/crash/redocrash02.log
6 rows selected.
SVRMGR >
Figure B: Sample v$datafile and v$logfile output
Look at each of the files shown by the above command. First, look at the data files. Each of the data files probably has the same modification time, or there might be a group of them with one modification time and another group with a different modification time. The main thing to look for is a missing file or a zero length file. Something else to look for is one or more files that have a modification time that is newer than the newest online redo log file. If a data file meets any one of these conditions, it must be restored from backup.
Redo log files, however, are a little different. Each redo log file within a log group should have the same modification time. For example, the output of the example command above shows that /db/Oracle/a/oradata/crash/redocrash01.log and /db/Oracle/a/oradata/crash/redocrash01.log are in log group one. They should have the same modification time and size. The same should be true for groups two and three. There are a couple of possible scenarios:
One or more log groups has at least one good and one damaged log
This is why redo logs are mirrored! Copy the good redo log to the damaged redo log's location. For example, if /db/Oracle/a/oradata/crash/redocrash01.log was missing, but /db/Oracle/a/oradata/crash/redocrash01.log was intact, issue the following command:
$ cp /db/Oracle/a/oradata/crash/redocrash01.log \
/db/Oracle/a/oradata/crash/redocrash01.log
All redo logs in at least one log group are damaged
This is a bad place to be. The "create controlfile" script in Step 6 requires that all online redo logs be present. If even one log group is completely damaged, it will not be able to rebuild the control file. This means that the only option available now is to proceed to Steps 23 and 24 -- a complete recovery of the entire database followed by an alter database open resetlogs.Warning! This is a drastic step! Make sure that all members of at least one log group are missing. (In the example above, if both /db/Oracle/a/oradata/crash/redocrash01.log and /db/Oracle/a/oradata/crash/redocrash01.log were damaged, this database would require a complete recovery.)
If all the redo logs in at least one group are damaged, and all the control files are damaged, proceed to Steps 23 and 24.
If the redo logs are all right, but all the control files are missing, proceed to Step 6.
If the database will not open for some other reason, proceed to Step 10.
Step 6: Is There a "create controlfile" Script?
Steps 4 and 5must be taken prior to this Step.
The svrmgrl l command alter database backup control file to trace creates a trace file that contains a "create controlfile" script. This command should be run from cron on a regular basis. To find out if there is such a script available, follow the instructions below. The first thing to find out is the destination of the trace files. This is specified by theuser_dump_dest value in the configinstance.ora file, usually located in $ORACLE_HOME/dbs. (Typically, it is $ORACLE_BASE/$ORACLE_SID/admin/udump.) First cd to that directory, then grep for the phrase CREATE CONTROLFILE. For example:
$ cd $ORACLE_HOME/dbs; grep user_dump_dest configcrash.ora
user_dump_dest = /db/Oracle/admin/crash/udump
$ cd /db/Oracle/admin/crash/udump ; grep 'CREATE CONTROLFILE' * \
|awk -F: '{print $1}'|xargs ls -ltr
-rw-r----- 1 Oracle dba 3399 Oct 26 11:25 crash_ora_617.trc
-rw-r----- 1 Oracle dba 3399 Oct 26 11:25 crash_ora_617.trc
-rw-r----- 1 Oracle dba 1179 Oct 26 11:29 crash_ora_661.trc
Figure C: Locating the most recent create controlfile script
In the example in Figure C, crash_ora_661.trc is the most recent file to contain the "create controlfile" script.
If there is a create controlfile script, proceed to Step 7. If there is not a create controlfile script, and all the control files are missing, proceed to Step 8.
Step 7: Run the 'create controlfile' Script
First, find the trace file that contains the script. The instructions on how to do that are in Step 6. Once you find it, copy it to another filename, such as rebuild.sql. Edit the file, deleting everything above the phrase # The following commands will create, and anything after the last SQL command. The file should then look something like the one in Figure D:
# The following commands will create a new controlfile and use it
# to open the database.
# Data used by the recovery manager will be lost. Additional logs may
# be required for media recovery of offline data files. Use this
# only if the current version of all online logs are available.
STARTUP NOMOUNT
CREATE CONTROLFILE REUSE DATABASE "CRASH" NORESETLOGS ARCHIVELOG
MAXLOGFILES 32
MAXLOGMEMBERS 2
MAXDATAFILES 30
MAXINSTANCES 8
MAXLOGHISTORY 843
LOGFILE
GROUP 1 '/db/a/oradata/crash/redocrash01.log' SIZE 500K,
GROUP 2 '/db/b/oradata/crash/redocrash02.log' SIZE 500K,
GROUP 3 '/db/c/oradata/crash/redocrash03.log' SIZE 500K
DATAFILE
'/db/a/oradata/crash/system01.dbf',
'/db/a/oradata/crash/rbs01.dbf',
'/db/a/oradata/crash/temp01.dbf',
'/db/a/oradata/crash/tools01.dbf',
'/db/a/oradata/crash/users01.dbf'
;
# Recovery is required if any of the data files are restored backups,
# or if the last shutdown was not normal or immediate.
RECOVER DATABASE
# All logs need archiving and a log switch is needed.
ALTER SYSTEM ARCHIVE LOG ALL;
# Database can now be opened normally.
ALTER DATABASE OPEN;
# Files in read only tablespaces are now named.
ALTER DATABASE RENAME FILE 'MISSING00006'
TO '/db/a/oradata/crash/test01.dbf';
# Online the files in read only tablespaces.
ALTER TABLESPACE "TEST" ONLINE;
Figure D: Example create controlfile script
Once the file looks like the above example, add the following line just above the "STARTUP NOMOUNT" line:
connect internal;
After you add this line, run the following command on the mounted, closed database, substituting rebuild.sql with the appropriate name:
$ svrmgrl < rebuild.sql
If all of the data files and online redo log files are in place, this will work without intervention and completely rebuild the control files.
If any of this instance's data files are missing, return to Step 4. However, if any of this instance's online redo logs are damaged or missing, this option will not work. Proceed to Step 8.
Step 8: Restore Control Files and Prepare the Database for Recovery
This Step is required only if Steps 2 through 7 have failed.
If the precautions mentioned elsewhere in this chapter were followed, there is really only one scenario that would result in this position -- loss of the entire system due to a cataclysmic event. Loss of a disk drive (or even multiple disk drives) is easily handled if the control files are mirrored. Even if all control files are lost, they can be rebuilt using the trace file created by running the backup control file to trace command. The only barrier to using that script is if all members of an online log group are missing. The only time that you could lose all mirrored control files and all members of a mirrored log group would be a complete system failure, such as a fire or other natural disaster. And if that is the case, then a complete database recovery would be more appropriate.
But I didn't mirror my control files or my online redo logs
Follow the steps below, starting with restoring the control files from backup. Chances are that the database files will need to be restored as well. This is because one cannot use a control file that is older than the most recent database file. (Oracle will complain and abort if this happens.) To find out if the control file is newer than the data files, try the following steps without overwriting the database files and see what happens.
Restore control files from backup
The very first step in this process is to find and restore the most recent backup of the control file. This would be the results of a backup control file to filename command. This is the only supported method of backing up the control file. Some people (oraback.sh included) also copy the control file manually. If there is a manual copy of the control file that is more recent than an "official" copy, try to use it first. However, if it doesn't work, use a backup copy created by the backup control file to filename command. Whatever backup control file is used, copy it to all of the locations and filenames listed in the configORACLE_SID.ora file after the phrase control_files:
control_files = (/db/Oracle/a/oradata/crash/control01.ctl,
/db/Oracle/b/oradata/crash/control02.ctl,
/db/Oracle/c/oradata/crash/control03.ctl)
Again, this backup control file must be more recent than the most recent database file in the instance. If this isn't the case, Oracle will complain.
Startup mount
To find out if the control file is valid and has been copied to all of the correct locations, attempt to start up the database with the mount option. (This is the same command from Step 1.) To do this, run the following command on the mounted, closed database:$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > startup mount;
Statement processed.
SVRMGR > quit
Take read-only tablespaces offline
Oracle does not allow read-only data files to be online during a recover database using backup control file action. Therefore, if there are any read-only data files, take them offline. To find out if there are any read-only data files, issue the following command on the mounted, closed database:$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select enabled, name from v$data file;
Statement processed.
SVRMGR > quit
For each read-only data file, issue the following command on a mounted, closed database:$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database data file 'filename' offline;
Statement processed.
SVRMGR > quit
Step 9: Recover the Database
This step is required only if Steps 2 through 7 have failed.
Once the control file is restored with a backup copy, attempt to recover the database using the backup control file.
Attempt to recover database normally
Since recovering the database with a backup control file requires the alter database open resetlogs option, it never hurts to try recovering the database normally first:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > recover database;
If the backup control file option is required, Oracle will complain:
SVRMGR > recover database
ORA-00283: Recover session cancelled due to errors
...
ORA-01207: file is more recent than controlfile - old controlfile
If the recover database command works, then proceed to Step 10. If it doesn’t, proceed to the next heading, "Attempt to recover database using backup control file."
Attempt to recover database using backup control file
Attempt to recover the database using the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > recover database using backup controlfile
If it works, the output looks will something Figure E:
ORA-00279: change 38666 generated at 03/14/98 21:19:05 needed for thread 1
ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_494.dbf
ORA-00280: change 38666 for thread 1 is in sequence #494
Figure D: Sample output of recover database command
If Oracle complains, there are probably some missing or corrupted data files. If so, return to Steps 4 and 5. Once any missing or corrupted data files are restored, return to this step and attempt to recover the database again.
Sometimes one can get in a catch-22 when recovering databases, where Oracle is complaining about data files being newer than the control file. The only way to get around this is to use a backup version of the data files that is older than the backup version of the control file. Media recovery will roll forward any changes that this older file is missing.
Apply all archived redo logs
Oracle will request all archived redo logs since the time of the oldest restored data file. For example, if the backup that was used to restore the data files was from three days ago, Oracle will need all archived redo logs created since then. Also, the first log file that it asks for is the oldest log file that it wants.
The most efficient way to roll through the archived redo logs is to have all of them sitting uncompressed in the directory that it suggests as the location of the first file. If this is the case, simply enter auto at the prompt. Otherwise, specify alternate locations or hit enter as it asks for each one, giving time to compress or remove the files that it no longer needs.
Apply online redo logs if they are available
If it is able to do so, Oracle will automatically roll through all the archived redo logs and the online redo log. Then it says, "Media recovery complete."
However, once Oracle rolls through all the archived redo logs, it may prompt for the online redo log. It does this by prompting for an archived redo log with a number that is higher than the most recent archived redo log available. This means that it is looking for the online redo log. Try answering its prompt with the names of the online redo log files that you have. Unfortunately, as soon as you give it a name it doesn't like, it will make you start the recover database using backup controlfile command again.
For example, suppose that you have the following three online redo logs:
/oracle/data/redolog01.dbf
/oracle/data/redolog02.dbf
/oracle/data/redolog03.dbf
When you are prompting for an archived redo log that has a higher number than the highest numbered archived redo log that you have, answer the prompt with one of these files (e.g., /oracle/data/redolog01.dbf). If the file that you give it does not contain the recovery thread it is looking for, you will see a message like the following:
ORA-00310: archived log contains sequence 2; sequence 3 required
ORA-00334: archive log: '/oracle/data/redolog01.dbf'
Oracle will cancel the recovery database, requiring you to start it over. Once you get to the same prompt again, respond with a different filename, such as /oracle/data/redolog02.dbf. If it contains the recovery thread it is looking for, it will respond with a message like the following:
Log applied.
Media recovery complete.
If after trying all the online redo logs it is still asking for a log that you do not have, simply enter cancel.
Alter database open resetlogs
Once the media recovery is complete, the next step is to open the database. As mentioned earlier, when recovering the database using a backup control file, it must be opened with the resetlogs option. Do this by entering:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database open resetlogs;
SVRMGR > quit
Take a backup immediately after recovering the database with the resetlogs option! It is best if it is a cold backup after shutting down the database. Perform a hot backup if absolutely necessary, but realize that there is a risk that:
- The entire recovery might need to be performed again
- All changes made after using the resetlogs option will be lost
If the database did not open successfully, return to Step 1 and start over.
If the database did open successfully, perform a backup of the entire database immediately -- preferably a cold one. Congratulations! You're done!
Step 10: Does "alter database open" work?
If the startup mount worked, this is actually only the second step that you will perform. Mounting the database only checks the presence and consistency of the control files. If that works, opening the database is the next step. Doing so will check the presence and consistency of all data files, online redo log files, and any rollback segments. To open the database, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database open;
SVRMGR > quit
If the attempt to open the database worked, Oracle will simply say, "Statement processed." If this is the first attempt to open the database, and no data files or rollback segments were taken offline, You're done!
If directed to this step by Steps 26 or 28 (damaged log groups), and the attempt at opening the database failed, return to Step 23 to recover the entire database.
If the database did open, proceed to Step 15.
If the attempt to open the database did not work, the output will vary depending on the condition. Here is a listing of what those conditions may be, accompanied by what the error might look like when that condition occurs.
Missing data file
ORA-01157: cannot identify data file 1 - file not found
ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'
Corrupted data file
A corrupted data file can generate a number of different errors. For instance, it may mimic a missing data file:ORA-01157: cannot identify data file 1 - file not found
ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'
It may also completely confuse Oracle:ORA-00600: internal error code, arguments: [kfhcfh1_01], [0], [], [], [], [], [], []A corrupted data file may also cause a "failed verification check" error:ORA-01122: database file 1 failed verification check
ORA-01110: data file 1: '/db/Oracle/a/oradata/crash/system01.dbf'
ORA-01200: actual file size of 1279 is smaller than correct size of 40960 blocks
These are just a few examples of the types of errors that Oracle may give if a data file is corrupted.
Missing member of any online log group
If the redo logs are mirrored, one or more of the mirrored copies are lost, but at least one good copy of each online redo log remains, Oracle will open the database without any errors displayed to the terminal. The only error will be a message like the following one in the alert log:Errors in file /db/Oracle/admin/crash/bdump/crash_lgwr_10302.trc:
ORA-00313: open failed for members of log group 2 of thread 1
All members of any online log group are corrupted
However, if all members of any online log group are corrupted, Oracle will complain and the database will not open. The error might look something like this:ORA-00327: log 2 of thread 1, physical size less than needed
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
Missing all members of any online log group
A similar problem is if all members of an online log group are missing. Oracle will complain and the database will not open. The error looks something like this:ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
Damaged rollback segment
If a rollback segment is damaged, the error will be like the following one:ORA-01545: rollback segment 'USERS_RS' specified not available
Cannot open database if all rollback segments are not available.
Damaged datafile
A damaged data file is actually very easy to recover from. This is a good thing, because this will occur more often than any other problem. Remember that there is only one copy of each data file, unlike online redo logs and control files that can be mirrored. So, statistically speaking, it's easier to lose one data file than to lose all mirrored copies of a log group or all mirrored copies of the control file.
Oracle also has the ability to recover parts of the database while other parts of the database are brought online. Unfortunately, this helps only if a partially functioning database is of any use to the users in your environment. Therefore, a database that is completely worthless unless all tables are available will not benefit from the partial online restore feature. However, if the users can use one part of the database while the damaged files are being recovered, this feature may help to save face by allowing at least partial functionality during an outage.
There are three types of data files as far as recovery is concerned:
- The first is a data file that is not a part of the SYSTEM tablespace and does not contain any rollback segments. Recovering this file (with the database online or offline) is very easy.
- The second type of data file is also a non-system data file, but one that happens to contain a rollback segment. Since rollback segments are needed to open the database, recovering such a file with the database online is difficult.
- The final type of data file is a file contained within the SYSTEM tablespace. This data file cannot be recovered with the database online, because the database cannot be brought online without it.
Damaged log group
If all members of a log group are damaged, there is a great potential for data loss. The entire database may have to be restored, depending on the status of the log group that was damaged, and the results of some attempts at fixing it. This may seem like a broken record, but this is why mirroring the log groups is so important.
If the error refers to a damaged log group, one option is to proceed directly to Step 17. However, to verify that nothing else is wrong, read the following notes and proceed to the next step.
Damaged rollback segment
Since Oracle has to open the data files that contain this rollback segment before it can verify that the rollback segment is available, this error will not occur unless a data file has been taken offline. If Oracle encounters a damaged data file (whether or not it contains a rollback segment), it will complain about that data file and abort the attempt to open the database.
Remember that a rollback segment is a special part of a tablespace that stores rollback information. Rollback information is needed in order to undo (or rollback) an uncommitted transaction. Since a crashed database will almost always contain uncommitted transactions, recovering a database with a damaged rollback segment is a little tricky. As previously mentioned, a damaged data file may be taken offline, but Oracle will not open the database without the rollback segment.
The strategy for dealing with this is to make Oracle believe that the rollback segment doesn't exist. That will allow the database to be brought online. However, there will be transactions that need to be rolled back that require this rollback segment. Since Oracle believes this rollback segment is no longer available, these rollbacks cannot occur. This means that the database may be online, but portions of it will not be available.
For example, suppose that we created a table called data1 inside tablespace USERS. Tablespace USERS contains the data file /db/oracle/a/oradata/crash/users01.dbf. Unfortunately, the database crashed before this transaction was committed, and the data file that contains the rollback segment for this transaction was destroyed. In the process of recovering this database, we took that data file offline, convinced Oracle that the rollback segment it contained was not needed, and opened the database. If we run the command select * from data1, we will receive the error shown in Figure F:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select * from data1;
C1
------------
ORA-00376: file 7 cannot be read at this time
ORA-01110: datafile 7: '/db/oracle/a/oradata/crash/users01.dbf'
Figure F: Sample data file error
This is because Oracle does not know if the uncommitted transactions in /db/oracle/a/oradata/crash/users01.dbf have been rolled back or not. In order to make this database fully functional, the damaged data file must be recovered and the rollback segment brought online.
Be aware, therefore, that if you bring a database online without all of its rollback segments, the database may be online -- but it probably will not be fully functional.
If the error indicates that there is a damaged rollback segment, proceed to Step 18.
Before going any farther…
Remember that Oracle will stop attempting to open the database as soon as it encounters an error with one file. This means, of course, that there could be other files that are damaged. If there is at least one damaged data file, now is a good time to check and see if there are other files that are damaged.
Detailed instructions on how to do that are provided in Step 5.
Once you know the names of all the damaged files, proceed to the next section.
How media recovery works
If any data files are restored from backup, the svrmgr recover command will be needed. This command uses the archived and online redo logs to "redo" any transactions that have occurred since the time that the backup of a data file was taken. You can recover a complete database, a tablespace, or a data file by issuing the commands recover database, recover tablespace tablespace_name and recover data file data file_name, respectively. These commands are issued inside a svrmgr shell. For example:
$ svrmgrl
SVRMGR > connect internal
SVRMGR > startup mount
SVRMGR > recover datafile '/db/Oracle/a/oradata/crash/datafile01.dbf'
These commands allow the restore of an older version of a data file, and use redo to roll it forward to the point of failure. For example, if we took a backup of a data file on Wednesday night, and that data file was damaged on Thursday evening, we would restore that data file from Wednesday night's backup. Of course many transactions would have occurred since Wednesday night, making changes to the data files that we restored. Running the command recover[database|tablespace|data file] would reapply those transactions to the restored data file, rolling them forward to Thursday evening.
This recovery can work in a number of ways. After receiving the recover command, Oracle prompts for the name and location of the first archived redo log that it needs. If that log, and all logs that have been made sinse that log, are online, uncompressed, and in their original location, enter the word AUTO. This tells Oracle to assume that all files that it needs are online. It can therefore automatically roll through each log that it needs.
In order to do this, all files that Oracle will need must be online. First, get the name of the oldest file, since that it is the first file it will need. That file name is displayed immediately after issuing the recover command:
ORA-00279: change 18499 generated at 02/21/98 11:49:56 needed for thread 1
ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf
ORA-00280: change 18499 for thread 1 is in sequence #481
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
In the example above, the first file that Oracle needs is /db/Oracle/admin/crash/arch/arch.log1_481.dbf. Make sure that this file is online and not compressed or deleted. If it is deleted, restore it from backup. If it is compressed, uncompress it and any archived redo log files in that directory that are newer than it. That is because Oracle may need all of them to complete the media recovery. It might be necessary to delete some of the older archived redo logs to make enough room for the files that need to be uncompressed. Once all archived redo logs that are newer than the one requested by Oracle have been restored and uncompressed, enter AUTO at the "Specify log" prompt.
If there isn't enough space for all of the archived redo logs to be uncompressed, a little creativity may be required. Uncompress as many as possible, and then hit enter each time it suggests the next file. (Hitting enter tells Oracle that the file that it is suggesting is available. If it finds that it is not available, it prompts for the same file again.) Once it has finished with one archive log, compress that log, and uncompress a newer log, since it will be needed it shortly. (Obviously, a second window is required, and a third window wouldn't hurt!)
At some point, it may ask for an archived redo log that is not available. This could mean some of the archived redo logs or online redo logs are damaged. If the file cannot be located or restored, enter CANCEL.
More detail on media recovery is available in Oracle's documentation.
If the database did not open, proceed to Step 11 after reading the preceding notes. If it did open, proceed to Step 15.
Step 11: Damaged System File?
If the damaged file is part of the SYSTEM tablespace, an offline recovery is required. All other missing data files can be recovered with the database online. Unfortunately, Oracle only complains that the data file is missing -- without saying what kind of data file it is. Fortunately, even if Oracle is down, there is an easy way to determine which files belong to the SYSTEM tablespace. (Finding out if the data file contains a rollback segment is a little more difficult, but it is still possible.) To find out which data files are in the SYSTEM tablespace, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select name from v$datafile where status = 'SYSTEM' ;
NAME
--------------------------------------------------------------------------------
/db/oracle/a/oradata/crash/system01.dbf
1 row selected.
This example report shows that the only file that is a member of the SYSTEM tablespace is /db/Oracle/a/oradata/crash/system01.dbf. In your configuration, however, there may be multiple data files in the SYSTEM tablespace.
If any of the damaged data files is a member of the SYSTEM tablespace, proceed to Step 12. If none of them is a member of the SYSTEM tablespace, then proceed to Step 13.
Step 12: Restore All Data Files in the SYSTEM Tablespace
Unlike other tablespaces, the SYSTEM tablespace must be available in order to open the database. Therefore, if any members of the system tablespace are damaged, they must be restored now. Before doing this, make sure that the database is not open. (It is okay if it is mounted.) To make sure, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select status from v$instance;
STATUS
-------
MOUNTED
1 row selected.
(The example above shows that this instance is mounted, not open.)
If the database is not open, restore the damaged files from the most recent backup available. Once all damaged files in the system tablespace are restored, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > recover tablespace system;
SVRMGR > media recovery complete
Once this command has completed, the system tablespace will be recovered to the time of failure.
If it does complete successfully, and no other data files are damaged, return to Step 10. For more information about the recover tablespace command, read the earlier section "How Media Recovery works" at the end of Step 10. If there are other data files to recover, proceed to Step 13.
Step 13: Damaged Non-System Data File?
So far, we have mounted the database, which proves that the control files are okay. It may have taken some effort if one or more of the control files were damaged, but it succeeded. We have also verified that the SYSTEM tablespace is intact, even if it required a restore and recovery. Most of the rest of this procedure concentrates on disabling damaged parts of the database so that it may be brought online as soon as possible. The process of elimination will identify all damaged data files once the database is opened successfully. They can then be easily restored.
If there are damaged data files that are not part of the SYSTEM tablespace, proceed to Step 14. If there are no more damaged data files, then proceed to Step 17.
Step 14: Take Damaged Data File Offline
To open a database with a damaged, non-system data file, take the data file offline. (If the file that is taken offline is part of a tablespace that contains rollback segments, there will be one other step, but we'll cross that bridge when we come to it.)
If this instance is operating in ARCHIVELOG mode, just take the data file offline. It can later be restored and recovered after the instance has been brought online. The command to do this is:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database datafile 'filename' offline;
If the instance is operating in NOARCHIVELOG mode, that's a different problem. Oracle does not allow the data file to be taken offline, because it knows it can't be brought back online without media recovery. Without ARCHIVELOG mode, there is no media recovery. The only thing Oracle does allow is to drop the data file entirely. This means, of course, that the tablespace that contains this file will have to be rebuilt from scratch. This is but one of the many reasons why a production instance should not be operating in no archive log mode. The command to do this is:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database datafile 'filename' offline drop;
Once any damaged files are taken offline, return to Step 10 and attempt to open the database again.
Step 15: Were Any Data Files Taken Offline?
Perform this step only if the database has been opened.
This step is really a very simple question!
If the database was opened without taking any data files offline, proceed to Step 29. If some data files were taken offline to open the database, proceed to Step 16. If unsure, proceed to Step 16.
Step 16: Bring Data File(s) Back Online
First find out which data files were taken offline. To do this, run the following command:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select name from v$datafile where status = 'OFFLINE' ;
NAME
------------------
/db/oracle/a/oradata/crash/temp01.dbf
/db/oracle/a/oradata/crash/tools01.dbf
/db/oracle/a/oradata/crash/users01.dbf
Restore the damaged datafiles
Once the names of the data files that need to be restored are determined, restore them from the latest available backup. Once they are restored, recovery within Oracle can be accomplished in three different ways. These ways vary greatly in complexity and flexibility. Examine the following three media recovery methods and choose whichever one is best for you.
Datafile recovery
If there is a small number of data files to recover, this may be the easiest option. As each file is restored, issue the recover data file command against it and then bring it online:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > recover datafile 'datafile_name' ;
Statement processed.
SVRMGR > alter database datafile 'datafile_name' online ;
Statement processed.
The downside to this method is that media recovery may take a while for each data file. If recovering multiple data files within a single tablespace, this is probably wasting time.
Tablespace recovery
This method is the hardest of all methods, but it may work faster than the previous method if there are several damaged data files within a tablespace. If forced to leave the partially functional database open while recovering the damaged data files, and there are several of them to recover, this is probably the best option.
First find out the names of all data files, and the tablespace to which they belong. Since the database is now open, this can be done in one step, demonstrated in Figure G:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select file_name, tablespace_name from dba_data_files;
Statement processed.
FILE_NAME
TABLESPACE_NAME
--------------------------------------------------------------------------------
------------------------------
/db/oracle/a/oradata/crash/users01.dbf
USERS
/db/oracle/a/oradata/crash/tools01.dbf
TOOLS
/db/oracle/a/oradata/crash/temp01.dbf
TEMP
/db/oracle/a/oradata/crash/rbs01.dbf
RBS
/db/oracle/a/oradata/crash/system01.dbf
SYSTEM
/db/oracle/a/oradata/crash/test01.dbf
TEST
Figure G: Listing of dba_data_files
The only problem with this output is that it's not very easy to read, and could be impossible to read if there are hundreds of data files. One way to make it easier to read is to modify the command, as shown in Figure H:
$ svrmgrl <<EOF |sed 's/ */ /' |sort >/tmp/files.txt
connect internal;
select file_name, tablespace_name from dba_data_files;
quit;
EOF
$ grep '^/' /tmp/files.txt
/db/oracle/a/oradata/crash/rbs01.dbf RBS
/db/oracle/a/oradata/crash/system01.dbf SYSTEM
/db/oracle/a/oradata/crash/temp01.dbf TEMP
/db/oracle/a/oradata/crash/test01.dbf TEST
/db/oracle/a/oradata/crash/tools01.dbf TOOLS
/db/oracle/a/oradata/crash/users01.dbf USERS
Figure H: Readable listing of data files
This way, the files are sorted in alphanumeric order, making it easy to find the necessary file(s).
Once all of the data files are restored, and the names of all the tablespaces that contain these data files have been determined, issue the recover tablespace command against each of those tablespaces. Before doing so, however, each of those tablespaces must be taken offline, as shown in Figure I.
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter tablespace tablespace_name1 offline;
Statement processed.
SVRMGR > recover tablespace tablespace_name1 ;
ORA-00279: change 18499 generated at 02/21/98 11:49:56 needed for thread 1
ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf
ORA-00280: change 18499 for thread 1 is in sequence #481
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Auto
Log applied
Media Recovery Complete
SVRMGR > alter tablespace tablespace_name1 online;
Statement processed.
SVRMGR > alter tablespace tablespace_name2 offline;
Statement processed.
SVRMGR > recover tablespace tablespace_name2 ;
ORA-00279: change 18499 generated at 02/21/98 11:49:56 needed for thread 1
ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf
ORA-00280: change 18499 for thread 1 is in sequence #481
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Auto
Log applied
Media Recovery Complete
SVRMGR > alter tablespace tablespace_name2 online;
Statement processed.
Figure I: Tablespace-based recovery
It's obvious that this method is quite involved! It's not pretty, it's not easy, but it allows recovery of multiple tablespaces while the instance continues to operate. If a partially functioning database is of any value to the users, this method may be their best friend.
Database recovery
This method is actually the easiest method, but it requires that the database be shut down to perform it. After restoring all the database files that were taken offline, close the database and issue the recover database command.
Once all the database files are restored, issued commands shown in Figure J.
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database close ;
Statement processed.
SVRMGR > recover database ;
ORA-00279: change 18499 generated at 02/21/98 11:49:56 needed for thread 1
ORA-00289: suggestion : /db/Oracle/admin/crash/arch/arch.log1_481.dbf
ORA-00280: change 18499 for thread 1 is in sequence #481
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
Auto
Log applied
Media Recovery Complete
SVRMGR > alter database open
Statement processed.
Figure J: Normal database recovery
To make sure that all tablespaces and data files have been returned to their proper status, run the commands shown in Figure K.
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select name, status from v$datafile
NAME
STATUS
--------------------------------------------------------------------------------
-------
/db/oracle/a/oradata/crash/system01.dbf
SYSTEM
/db/oracle/a/oradata/crash/rbs01.dbf
ONLINE
/db/oracle/a/oradata/crash/temp01.dbf
ONLINE
/db/oracle/a/oradata/crash/tools01.dbf
ONLINE
/db/oracle/a/oradata/crash/users01.dbf
ONLINE
/db/oracle/a/oradata/crash/test01.dbf
ONLINE
6 rows selected.
SVRMGR > select member, status from v$logfile
NAME
STATUS
--------------------------------------------------------------------------------
-------
/db/oracle/a/oradata/crash/system01.dbf
SYSTEM
/db/oracle/a/oradata/crash/rbs01.dbf
ONLINE
/db/oracle/a/oradata/crash/temp01.dbf
ONLINE
/db/oracle/a/oradata/crash/tools01.dbf
ONLINE
/db/oracle/a/oradata/crash/users01.dbf
ONLINE
/db/oracle/a/oradata/crash/test01.dbf
ONLINE
6 rows selected.
SVRMGR> select * from v$controlfile;
STATUS NAME
------- ------------------------------------------------------------------------
--------
/db/oracle/a/oradata/crash/control01.ctl
/db/oracle/b/oradata/crash/control02.ctl
/db/oracle/c/oradata/crash/control03.ctl
3 rows selected.
Figure K: Obtaining the names of all data files, control files, and log files
The example above shows that all data files, control files, and log files are in good condition. (In the case of the log files and control files, no status is good status.)
Once any data files that were taken offline have been restored and recovered, proceed to Step 29.
Step 17: Is There a Damaged Log Group?
When we refer to a damaged log group, we mean that all members of a log group are damaged. If at least one member of a mirrored log group is intact, Oracle opens the database and simply put an error message in the alert log. However, if all members of a log group are damaged, the database will not open, and the error will look something like this:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
If there is no error like this, there is no damaged log group. Proceed to Step 18.
The first thing that must be determined is the status of the damaged log group. The three possibilities are current, active, and inactive. To determine the status of the damaged log group, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select group#, status from v$log;
The output looks something like this:
GROUP# STATUS
---------- ----------------
1 INACTIVE
2 CURRENT
3 ACTIVE
3 rows selected.
The example above shows that log group 1 is inactive, group 2 is current, and group 3 is active. What follows is an explanation of the different statuses, and how they affect the recovery.
Current
The current log group is the one to which Oracle was writing when the failure occurred. It will still be listed as active until the server is brought online and a log switch occurs.
Active
An active log group is usually the log group that Oracle just finished writing to. However, until a checkpoint occurs, this group is still needed for media recovery. Since a log switch always forces in checkpoint, a status of active is actually very rare. In fact, the only way to see this (before the system crashed) is to run the above command while a checkpoint is in progress. (In a properly tuned database, this is a very short period of time.)
Inactive
An inactive log group is one that is not being used by Oracle in any way.
To determine what action to take next, first gave the number of the log group whose log files are damaged. In the example error above, it reads open failed for members of log group 2. Reference this number against the log groups listed by the select * from v$log command. In the example above, log group 2 was current at the time the database crashed.
If the damaged log group was current, proceed to Step 22. If it was active, proceed to Step 25. If it was inactive, proceed to Step 27.
Step 18: Are Any Rollback Segments Unavailable?
If a rollback segment is damaged, Oracle will complain when attempting to open the database. The error looks like the following:
ORA-01545: rollback segment 'USERS_RS' specified not available
Cannot open database if all rollback segments are not available.
If you haven't already read the note about damaged rollback segments in Step 10, do so now.
If the preceding error is displayed when attempting to open the database, proceed to Step 19. If not, return to Step 10.
Step 19: Does the Database Need to be at Least Partially Up ASAP?
Because of the unique nature of damaged rollback segments, there are two choices for recovery. The first is to get the database open sooner, but that may leave it only partially functional for a longer period of time. The second choice takes a little longer to open the database, but once it is open it will not have data files that are needed for this rollback segment. Which is more important: getting even a partially functional database open as soon as possible, or not opening the database until all rollback segments are available? The latter is more prudent, but the former may be more appropriate to the environment.
If the database needs to be partially open ASAP, proceed to Step 21. If it's more important to make sure all rollback segments are available prior to opening the database, proceed to Step 20.
Step 20: Recover Tablespace Containing Unavailable Rollback Segment
Perform this step only if directed to do so by Step 19.
The first thing that must be determined is which tablespace the damaged rollback segment is in. Unfortunately, there is no fixed view that contains this information. That means that it will have to be discovered through common sense and deduction. First, remember that this error is not displayed unless a data file has been taken offline. To get a complete list of files that were taken offline, run the following command on a mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select TS#, name from v$datafile where status = 'OFFLINE' ;
NAME
--------------------------------------------------------------------------------
5 /db/oracle/a/oradata/crash/test01.dbf
1 row selected.
Then find out the name of the tablespace that contains this data file:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select name from v$tablespace where TS# = '5' ;
NAME
--------------------------------------------------------------------------------
TEST
1 row selected.
That was too easy!
Admittedly, the previous example was easy. There was only one data file that was offline, which made finding its tablespace pretty easy. What if there were multiple data files that were contained within multiple tablespaces? How do we know which one contains the rollback segment? Unfortunately, there is no way to be sure while the database is closed. That is why it is very helpful to put the rollback segments in dedicated tablespaces that have names that easily identify them as such. It's even more helpful if the data files are named something helpful as well. For example, create a separate tablespace called ROLLBACK_DATA, and call its data files rollback01.dbf, rollback02.dbf, etc. That way, anyone that finds himself in this scenario will know exactly which data files contain rollback data.
The rest of this step is simple. Restore any files that were taken offline, and use either the recover data file or recover tablespace commands to roll them forward in time. If there are only one or two damaged data files, it's probably quicker to use the recover data file command. If there are several damaged data files, especially if they are all in one tablespace, the recover tablespace command is probably easiest. Either way will work.
Once any data files that contain rollback segments have been restored and recovered, return to Step 10 and attempt to open the database again.
Step 21: Comment Out Rollback Segment Line(s) in the init.ora File
Perform this step only if directed to do so by Step 19.
There is a quicker way to open the database with damaged rollback segments. In order for Oracle to know what rollback segments to look for, the following line is inserted into the initORACLE_SID.ora file:
rollback_segments = (r01,r02,r03,r04,users_rs)
(The initORACLE_SID.ora file is usually found in $ORACLE_HOME/dbs.) Since the example error above says that it is the USERS_RS rollback segment that is unavailable, simply delete that part of the line. It is wise, of course, to comment out and copy the original line. First, shut down Oracle completely (this includes un-mounting it as well). Then copy and comment the rollback segment line in the initORACLE_SID.ora file:
#rollback_segments = (r01,r02,r03,r04,users_rs)
rollback_segments = (r01,r02,r03,r04)
Once this change has been made in the initORACLE_SID.ora file, return to Step 1 to mount the database.
Step 22: Is the Current Online Log Damaged?
Performing this step only if instructed to do so by Step 17. If not, return to Step 17 now.
If the current online log group is damaged, there would be a message like the following when attempting to open the database:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
In the example above, a select group#, status from v$log command would have also showed that log group 2 was CURRENT at the time of failure.
This is the worst kind of failure to have because there will definitely be data loss. That is because the current online log is required to restart even a fully functioning database. The current control file knows about the current online log and will attempt to use it. The only way around that is to restore an older version of the control file. Unfortunately, you can't restore only the control file because the data files would then be more recent than the control file. The only remaining option is to restore the entire database.
For the procedure to restore the entire database, proceed to Step 23.
Step 23: Recover All Database Files from Backup
Warning! There are only two reasons to perform this step. The first is if instructed to do so by Step 22. The other is if there was an unsuccessfull attempt to open the database after performing either Steps 26 or 28. This step is the most drastic method of recovery, and should not be performed unless absolutely necessary.
Perform this step only after verifying (or rebuilding or restoring) the control files, and verifying that all members of the current online log group are damaged. This step is relatively easy. Simply determine the names and locations of all of the data files and restore them from their latest backup.
Warning! Restore only the data files, not the control files. Do not restore or overwrite the control files unless instructed to do so by Step 9!
To determine the names of all the data files, run the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select name from v$datafile ;
Once all data files are restored, proceed to Step 24.
Step 24: Alter Database Open reset logs
Warning! Perform this step only if instructed to do so by Step 23. This is another drastic step that should only be performed if necessary!
This command causes Oracle to open the database after clearing all contents of the online redo log files. Since there is no way to undo this step, it is a good idea to make copies of the online redo log files now. To find out all their names, run the following command on a mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select member from v$logfile ;
To create an "undo" option, copy each of these files to filename.bak.
After making a backup of the online redo log files, run the following command on a mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database open resetlogs ;
Statement processed.
If the database opens, congratulations!
Make a backup of this database immediately, preferably with the database shut down. That is because Oracle cannot roll through this point in time using the redo logs. Oracle must have a full backup taken after using the open resetlogs command in order to restore this database using any logs that are made after the open resetlogs was performed.
Once that backup is completed, you're done!
Step 25: Is an Active Online Redo Log Damaged?
Perform this step only if instructed to do so by Step 17. If not, return to Step 17 now.
If an ACTIVE online log group is damaged, there will be a message like the following when attempting to open the database:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
In the example above, a select group#, status from v$log command would have also shown that log group 2 was ACTIVE at the time of failure.
Remember that an ACTIVE log is one that is still needed for recovery. The reason that it is still needed is because a checkpoint has not flushed all changes from shared memory to disk. Once that happens, this log will no longer be needed.
To perform a checkpoint, proceed to Step 26.
Step 26: Perform a Checkpoint
The way to attempt to recover from the scenario in Step 25 is to perform a checkpoint. If it is successful, the database should open successfully. To perform a checkpoint, issue the following command on the mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter system checkpoint local ;
Statement processed.
Be patient. The reason that there is an ACTIVE log group is that the checkpoint took a long time in the first place. Wait for Oracle to say that the checkpoint succeeded or failed. If it succeeded, Oracle will simply say, "Statement processed." If it fails, there could be any number of Oracle errors.
After issuing the checkpoint, even if it was unsuccessful, return to Step 10 and attempt to open the database. If this attempt fails, return to Step 23 and recover the entire database.
Step 27: Is an Inactive Online Redo Log Damaged?
Perform this step only if instructed to do so by Step 17. If not, return to Step 17 now.
If an INACTIVE online log group is damaged, there would be a message like the following when attempting to open the database:
ORA-00313: open failed for members of log group 2 of thread 1
ORA-00312: online log 2 thread 1: '/db/Oracle/b/oradata/crash/redocrash02.log'
ORA-00312: online log 2 thread 1: '/db/Oracle/a/oradata/crash/redocrash03.log'
In the example above, a select group#, status from v$log command would have also shown that log group 2 was INACTIVE at the time of failure.
In comparison, this one should be a breeze. An INACTIVE log is not needed by Oracle. If it is not needed, simply drop it and add another in its place.
To drop and add an INACTIVE log group, proceed to Step 28.
Step 28: Drop/Add a Damaged, INACTIVE Log Group
Perform this step only if instructed to do so by Step 27.
In all the above examples, the damaged log group was group 2. Before we drop that group, we should make sure that we can add it back easily. Ensure that all the original redo log locations are still valid. To do this, get the names of all of the members of that log group:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > select member from v$logfile where GROUP# = '2 ;
For this example, Oracle returned the values:
/logs1redolog01.dbf
/logs2redolog01.dbf
/logs3redolog01.dbf
Verify that all these files' locations are still valid. For this example, assume /logs3 is completely destroyed, and we are relocating all its contents to /logs4. Therefore, the future members of log group 2 will be /logs1redolog01.dbf, /logs2redolog01.db, and /logs4redolog01.dbf.
To drop log group 2, issue the following command on a mounted, closed database:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database drop logfile group 2 ;
Statement processed.
Once that command completes successfully, add the log group back to the database. To do this, issue the following command (Remember that we have replaced /logs3redolog01.dbf with /logs4redolog01.dbf.):
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter database add logfile group 2 ('/logs1redolog01.dbf', '/logs2redolog01.dbf', '/logs4redolog01.dbf') size 500K ;
Statement processed.
Once this command completes successfully, return to Step 10 and attempt to open the database.
Step 29: Were Any Rollback Segment Lines Changed in init.ora?
There was an option in Step 19 to comment out rollback segments from the initORACLE_SID.ora file. If that option was taken, there should be a line in that file that looks like the following:
#rollback_segments = (r01,r02,r03,r04,users_rs)
rollback_segments = (r01,r02,r03,r04)
If any rollback segments were taken offline, proceed to Step 30. If there were not, back up the database now. You're done!
Step 30: Return Offline Rollback Segments to Normal Condition.
To check which rollback segments are offline, run the following command:
SVRMGR> select segment_name from dba_rollback_segs where status = 'OFFLINE' ;
SEGMENT_NAME
------------------------------
USERS_RS
1 rows selected.
Since all data files and redo log files should be recovered by now, just return the offline rollback segments to an online status:
$ svrmgrl
SVRMGR > connect internal;
Connected.
SVRMGR > alter rollback segment users_rs online ;
Statement processed.
Once this has been completed, make sure that any commented lines in the initORACLE_SID.ora file are put back to their original condition. The example in Step 29 used the suggested method of commenting out the original line, and changing a copy of it. Return the line in initORACLE_SID.ora to its original condition:
rollback_segments = (r01,r02,r03,r04,users_rs)
This step ensures that the next time this database is opened, the USERS_RS rollback segment will be used.
You're done!
If you've made it this far, you're done! All data files, control files, and log files should be online. Take a backup of the entire database immediately, preferably a cold one with the database down. If that can't be done, then perform a hot backup.
No comments:
Post a Comment