This article discusses hard disk data recovery on Linux using dd and fdisk.

I recently left for a trip to South America, and took my trusty Intenso 320GB external drive with. Well aware that I’ve dropped it a couple too many times and that it was beginning to click more and more often during regular usage, I took a full backup before leaving. There’s nothing critical on the drive that I don’t have additional copies of elsewhere, however losing it would be a pain.

Having reached Madrid airport, I plugged the drive in and was about to pull some documents off it when disaster struck. The drive just clicked for about 30 seconds before Windows prompted me to format it. I tried removing it and reinserting it a couple of times but no luck – the drive had failed. I went to the duty free store in the airport and picked up a 1Tb WD Elements drive for 99 Euros, and planned to attempt data recovery when I arrived in South America.

I’m keen to get the data recovery started – it’s going to take a while on my USB 2.0 laptop and the more bad sectors, the longer it will take.

I plug both drives in and boot into Debian. The old and failing drive appears as /dev/sdb and the new drive as /dev/sdc I begin with:

dd if=/dev/sdb of=/root/disk.img bs=512 conv=noerror

I use 512 bytes as the number to read/write at a time as that is what fdisk reports to be optimal for both drives. noerror indicates that dd should not quit on unrecoverable read errors, which I expect there to be due to the state of the drive. Instead, it will report the error and continue.

dd will print its current progress on receiving SIGUSR1. From a separate terminal window:

kill -USR1 $(pidof dd)

Sure enough, at the 3.1Gb mark, dd starts outputting read errors:

root@w:~# dd if=/dev/sdb of=/root/disk.img bs=512 conv=sync,noerror
dd: reading `/dev/sdb': Input/output error
6002656+0 records in
6002656+0 records out
3073359872 bytes (3.1 GB) copied

OK.. this is to be expected.. lets hope that we don’t get too many read errors and therefore too much corruption.

These errors continue every couple of seconds until the 142nd read error:

dd: reading `/dev/sdb': Input/output error
6002656+142 records in
6002798+0 records out
3073432576 bytes (3.1 GB) copied

The statistics here show that 6002656 records were read of which 142 failed. 6002798 records were written (6002656 + 142).

If our read/write size is 512 bytes, then 142 failed reads is 72704 bytes of lost data. So far, this isn’t catastrophic. Ultimately, it depends on where this corruption exists – whether in a single file, across a whole range of small files or within critical structures of the filesystem. I’m confident at this stage that the data should be recoverable.

A few minutes pass and no further errors are printed – hopefully this is the full extent of the damage.

In a separate window, I’m running the kill command above to output progress. I notice that despite starting off at over 25MB/sec, dd is now only running at 3.1MB/sec and doesn’t seem to be picking up the speed again over time. I assume that dd drops to a lower rate after encountering errors. I kill the process with Ctrl+C, and then restart it from the block after the last errors:

dd if=/dev/sdb of=/root/disk.img bs=512 conv=noerror seek=6002799 skip=6002799

The seek and skip parameters instruct dd to start from that position on the input and output side respectively. dd then continues at the mighty rate of over 25MB/sec. After about 4 hours, the process is complete with no further errors.

Excellent. Now to write /root/disk.img to /dev/sdc

dd if=/root/disk.img of=/dev/sdc bs=512

I return after a few minutes to find that dd has quit early!

dd: writing `/dev/sdc': Input/output error

Am I writing to the wrong disk? I double check dmesg and verify that I am in fact writing to the correct disk. I restart dd from where it left off and it immediately quits with another IO error. How can this be? On a brand new disk! At this stage, I have no way to return the disk back to Madrid airport. I run a couple of further tests with dd and find that the disk only has about 300 bad sectors up to the 3.8GB mark (With bs=512, that’s location 7969177).

As the existing drive has a capacity of 320Gb and the new drive 1Tb, I decide to ignore the first 4GB on the new drive. I could work around the bad sectors but given that the disk is 1Tb, this is non critical, and I don’t intend to use this disk in the long term, I’m happy with that decision. The existing drive only has the 1 NTFS partition (sdb1). Now, I can’t just skip 4GB with dd as the partition table will be written to the wrong place. I first need to check the exact size of the original sdb1 partition. Using:

# fdisk -l /dev/sdb

Disk /dev/sdb: 320.1 GB, 320072581120 bytes
255 heads, 63 sectors/track, 38913 cylinders, total 625141760 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0009fd5d

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1             2048      625141759   316763136 7  HPFS/NTFS/exFAT

Now, I’m going to create a partition of identical size on /dev/sdc but 4Gb in. To find out the location in units based on a 512 byte sector size, we calculate 4*1024*1024*1024/512 = 8388608

# fdisk /dev/sdc

Command (m for help): n
Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-625141759, default 2048): 8388608
Last sector, +sectors or +size{K,M,G} (8388608-2147483648, default 2147483648): 633528319

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.

WARNING: Re-reading the partition table failed with error 16: Device or resource busy.
The kernel still uses the old table. The new table will be used at
the next reboot or after you run partprobe(8) or kpartx(8)
Syncing disks.

# partprobe /dev/sdc

Now, we’ll need to gain access to the first partition (sdb1) within the disk image /root/disk.img. From our earlier fdisk -l we know that sdb1 starts 2048 sectors in:

losetup /dev/loop0 /root/disk.img -o $((2048 * 512))

And we can now use dd to read from /dev/loop0 and write to /dev/sdc1 directly:

dd if=/dev/loop0 of=/dev/sdc1 bs=512

4 hours further, and we get no errors. Data recovery is complete with hopefully no more than a little corruption. Rebooting into Windows, and the disk and partition is now immediately recognized. From Windows 7 Pro, I right click on the drive in ‘My Computer’ and navigate to Properties->Tools->Error Checking->Check Now. Errors are detected and the filesystem is repaired without issue.

Warning: This process was used on non critical data that is already backed up. If a brand new disk has bad sectors, consider returning it. They won’t get better over time.