Recovering a Large Corrupt BZip2

By Paulus, 10 August, 2013

Out of the blue last month, my AMCC (LSI) 3Ware 9650SE failed. Before I continue, I want to assure you that I did have a back up, albeit a few months old, but a backup nonetheless. Luckily, I was able to purchase the same model RAID card to replace the faulty one. After installing and rebuilding the RAID I found that some files were corrupted. So I pulled the backup, and compared the backup files with the current version:

diff -q /mnt/backup/home /home | grep -v "Only in /mnt/backup/home" 

I was only interested in files that differed and knowing that there were bound to be files that existed in one location and not the other due to a spring cleaning. When I wanted to compare my samba directory, which was in a different archive, I found that the archive was corrupt. When I ran bzip2recover, I ran into another issue. The issue was that the archive has more than 50,000 blocks. The only way to get around this limitation was to edit the bzip2recover file by modifying line 294:

#define BZ_MAX_HANDLED_BLOCKS 50000

After compiling and running bzip2recover I ended up with about 100,000+ rec*.bz2 files making the following command unusable:

cat rec*.bz2 > recovered.bz2

Since the maximum command line arguments is set to 32 in the include/linux/binfmts.h of the Linux kernel:

#define MAX_ARG_PAGES 32

Modifying the kernel just for this doesn't make sense. So I ran the following shell script:

for FILE in `find . -type f -iname rec*.bz2`
do
cat $FILE >> recovered.bz2
done