Custom Search

April 14, 2010

"Bad Image Header" in Image DB

All Log Entries or Problems Log reports show: "Bad image header," indicating a problem with the NetBackup images database.

Exact Error Message
Bad image header
Details:
The All Log Entries or Problems Log reports show entries such as the following:

06/09/98 16:41:40 sparky -  cleaning image DB
06/09/98 16:41:42 sparky -  Bad image header: Unix_0896295338_INCR
06/09/98 16:41:44 sparky -  Bad image header: Unix_0896295341_INCR
06/09/98 16:42:01 sparky -  Bad image header: Unix_0896704580_INCR

"Bad image header" messages result from backup image headers that are incomplete.

Background Information

When VERITAS NetBackup (tm) executes backups, it writes information to the Images database on the NetBackup master server. The Images database resides in /usr/openv/netbackup/db/images. For every backup, NetBackup writes two files to the Images database:  the backup image header file and the "files file."  For a valid NetBackup image, both of these files must exist, otherwise the backup image is incomplete and a restore is impossible.

If the filesystem that the Images database resides on reaches 100% utilization NetBackup can no longer record information about the backups it is attempting to process. The active backups can no longer write to the Images database and the backup image headers of the active backups become corrupt. Typically, the backup image headers are re-written with a size of zero bytes. It is the zero byte length image headers that cause the "Bad image header" messages in the reports

An unexpected shutdown (such as a power outage or system crash) or a  disk device failure can also corrupt image header files.  In these cases the image header file may be zero bytes in length, contain only spaces, or contain garbled data.  This type of corruption can occur after the backup is complete - for example during duplication, copy expiration, or even while no NetBackup activity is occurring on the image (in the case of device failure).


Resolution:

There are several options available to resolve the problem of bad image header:

Option 1:
Under most circumstances, running the catalog consistency checking tool will remove the suspect bad image heads, and place them in the /usr/openv/netbackup/db.corrupt folder.  The tool can be run by executing the following command and redirecting the output to a file
Unix:  /usr/openv/netbackup/bin/bpdbm -consistency -move > /path_to_direct_output/consistency.out
Windows: \Veritas\netbackup\bin\bpdbm -consistency > \path_to_direct_output\consistency.out

Administrators should review the db.corrupt folder and consistency.out files and take corrective actions as needed to recreate the images database information such as importing media, modifying the files or restoring individual files.


Option 2:
Administrators may choose to manually remove the bad image headers to resolve the problem.  This can be done by completing the following steps:
1. Shutdown the NetBackup master server
/usr/openv/netbackup/bin/goodies/bp.kill_all


2. Change directory to the affected host's subdirectory in the images database. Using the above example output from the reports, "sparky" is the affected host.
cd /usr/openv/netbackup/db/images/sparky


3. Change directory to the directory containing the affected backup image header. Given the file name of the backup image header (Unix_0896295338_INCR), you can determine which directory to change to.
cd  0896000000


4. Confirm that the backup image header is, in fact, zero bytes in size, contains all spaces or is otherwise obviously corrupted:
ls -l Unix_0896295338_INCR


5. Confirm if the backup image header's "files file" exists:
ls -l Unix_0896295338_INCR.f


6. Remove the backup image header file and it's corresponding "files file" if it exists:

rm Unix_0896295338_INCR
rm Unix_0896295338_INCR.f


Complete steps 1 - 6 for each reported "Bad image header" and then restart NetBackup on the master server.

Option 3:
If corruption of the image files occurs after completion of the backup, such as due to a hardware issue or unexpected power outage, the following steps may be used in order to recover the image files on 6.x systems. These directions must be followed exactly as failure to do so may cause other catalog corruption and should only be used after Options 1 and 2 are reviewed.    Symantec NetBackup Support should be engaged if there are questions about this procedure.
  1. Determine a time frame when the corruption occurred.  If the problems database base contains enough history,  NetBackup Administrators may be able to use the Problems report to find the first recorded occurrence of the "Bad Image Header" error.  
  2. Using the time of the first error reported, Administrators can use the Backup, Archive and Restore interface to query for catalog backups of the type "NBU-Catalog" to find the catalog backup that occurred before the first error.   A restore of the suspect files to an alternate location  can be done and the files manually examined for corruption.  If the file in the most recent catalog backup is also corrupted, continue restoring from earlier catalog backups until a non-corrupted version of the image file is found.
    ****Note**** this is not a Hot Catalog recovery using the DR file - this is a standard restore to an alternate location.
  3. Examine the contents of the restored image file for invalid media references (any copies expired after the catalog backup was done would still exist in the restored image file, etc.).  Manually edit the image file to remove any invalid media references.  Once the file has been edited, it can be moved into the appropriate catalog directory.
  4. Check whether the image is due to expire soon  by reviewing the EXPIRATION tag in the image header and run /usr/openv/netbackup/bin/bpdbm –ctime   on Unix or \Veritas\netbackup\bin\bpdbm -ctime on Windows.  
    If the image is due to expire soon, create the touch file NOexpire in the following location:
    Unix:  /usr/openv/netbackup/bin  Windows:  \Veritas\netbackup\bin
  5. Verify the image to ensure that it is valid.
  6. Take any steps necessary to protect the backup image.  If images were duplicated after the catalog backup used in the restore, it may be necessary to extend the expiration date using the bpexpdate command or manually duplicate the backup.
  7. Remove the NOexpire file if it was created in step 4.

Prevention
  • Make sure that the NetBackup master server is protected from power outages via redundant power supply, such as a battery backup or other solution.
  • Keep the NetBackup master server current on critical OS and NetBackup patches.
  • Implement a maintenance routine for monitoring the status of disk devices used by the NetBackup master server.
  • Monitor disk usage growing filesystems or adding disk space by other methods when/if needed.

1 comment:

  1. Hello,

    Awesome! I am loving every bit of this! Server hardware is normally more robust than a workstation. Thank you for commenting on your experiences here.

    Storage Server Chassis

    ReplyDelete