Like most of my adventures, this one began with me making a stupid mistake. Of course, I didn't think it was a mistake at the time - I thought I was being very smart. But that's not the way it worked out.
My first disk, /dev/sda, is a 320GB drive dedicated to MS Windows. The original partition table looked like this in fdisk:
bash# fdisk -l /dev/sda Disk /dev/sda: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0002a4e7 Device Boot Start End Blocks Id System /dev/sda1 1 25497 204800000 7 HPFS/NTFS /dev/sda2 * 25497 38913 107767648+ 7 HPFS/NTFS
Windows Server 2008 occupies the 200GB sda1 partition. sda2 is for Windows 7.
I needed about 10GB to do some Windows XP experiments. Not a lot of space but I wanted a clean drive. So I cloned sda1 to a file on my linux drive (sdb) using ntfsclone and deleted the partition. That freed 200GB for my experiments which is way more than I needed. But I don't want Windows XP to see my Win7 installation on sda2 because Windows has a weird tendency to store its boot files on other partitions. So I deleted sda2. I don't need to backup sda2 because I'm not going to write any data there. All I need to do is record the original fdisk parameters so I can redefine the partition later. (This is the part where I thought I was being smart.)
Experiment completed, I redefined the partitions using fdisk by entering the values shown in the original fdisk output. I used ntfsclone to restore the Windows Server 2008 partition. No problem with sda1 but linux couldn't mount the NTFS filesystem in sda2. I tried the same thing a second time - still couldn't mount the partition. Tried a third time - still couldn't mount sda2. This doesn't make sense - I know all the data is still there, all I need to do is redefine the partition and the filesystem should be there. But I remembered someone telling me that the definition of Insanity is doing the same thing over and over again and expecting a different outcome. So maybe it is time to stop and think.
A few minutes of sober second thought revealed the problem: fdisk shows the start and end cylinder numbers, but each cylinder contains 16065 blocks. Did you notice that sda1 ends in the same cylinder that sda2 starts? That means that within cylinder 25497 there is a block, call it LAST_BLOCK_IN_SDA1 which is the last block in sda1. And later in the same cylinder there is another block which is FIRST_BLOCK_IN_SDA2. I don't know exactly where either of these blocks is located - could be any two of the more than sixteen thousand blocks. And the two blocks might not be right next to each other, there could be unused blocks between them. In fact, the original sda1 might not have started at the first block in cyclinder 1. In hindsight, fdisk might be a decent tool for quickly partitioning a disk, but it is a poor tool for partition backup and recovery. I need to find something better than fdisk ...
fdisk doesn't provide fine enough resolution when defining partitions: imagine trying to paint an image with a brush 16,000 pixels wide. parted and gparted are a little better. Parted supports "chs" (cylinder, head, sector) definitions of location. But sfdisk is even better. It supports partition locations by block number, plus it can dump the partition table to a human readable file then later accept that file as input to redefine the partitions. For example,
bash# sfdisk -d /dev/sda # partition table of /dev/sda unit: sectors /dev/sda1 : start= 63, size=40960000, Id= 7 /dev/sda2 : start=409601561, size=215521217, Id= 7, bootable /dev/sda3 : start= 0, size= 0, Id= 0 /dev/sda4 : start= 0, size= 0, Id= 0
redirect the output to a file like this:
bash# sfdisk -d /dev/sda > partiton_table_sda.sfdisk
The file looks like this:
bash# cat partition_table_sda.sfdisk # partition table of /dev/sda unit: sectors /dev/sda1 : start= 63, size=40960000, Id= 7 /dev/sda2 : start=409601561, size=215521217, Id= 7, bootable /dev/sda3 : start= 0, size= 0, Id= 0 /dev/sda4 : start= 0, size= 0, Id= 0
then you can use that file as input to write the partition table
bash# sfdisk --force /dev/sda < partition_table_sda.sfdisk
sfdisk is what I should have used to backup my partition table. Good to know for next time. But how can I find the start sector for my lost partition? I don't have time to test 16065 blocks individually. There must be a way to do it automatically ...
gpart scans a block device (ie hard drive, but could be a hard drive image stored in a data file) for filesystems. When it finds a filesystem it detects the size, skips forward that amount and continues the search for more filesystems. If you think one filesystem might overwrite another, then you can disable the skip forward feature. Gpart will guess the properties for the partition table and output everything to stdout so you can save it or use it immediately to create a new partition table. For instance I can scan my good drive, sdb, like this:
bash# gpart /dev/sdb Begin scan... Possible partition(Linux swap), size(4102mb), offset(0mb) Possible partition(Linux ext2), size(949764mb), offset(4102mb) End scan. Checking partitions... Partition(Linux swap or Solaris/x86): primary Partition(Linux ext2 filesystem): primary Ok. Guessed primary partition table: Primary partition(1) type: 130(0x82)(Linux swap or Solaris/x86) size: 4102mb #s(8401928) s(63-8401990) chs: (0/1/1)-(522/254/59)d (0/1/1)-(522/254/59)r Primary partition(2) type: 131(0x83)(Linux ext2 filesystem) size: 949764mb #s(1945118064) s(8401995-1953520058) chs: (523/0/1)-(1023/254/63)d (523/0/1)-(121600/254/57)r Primary partition(3) type: 000(0x00)(unused) size: 0mb #s(0) s(0-0) chs: (0/0/0)-(0/0/0)d (0/0/0)-(0/0/0)r Primary partition(4) type: 000(0x00)(unused) size: 0mb #s(0) s(0-0) chs: (0/0/0)-(0/0/0)d (0/0/0)-(0/0/0)r
Here is the output of my search for the lost partition:
Begin scan... Possible partition(Windows NT/W2K FS), size(239999mb), offset(131610mb) type: 007(0x07)(OS/2 HPFS, NTFS, QNX or Advanced UNIX) size: 239999mb #s(491519992) s(269537688-761057679) chs: (1023/254/63)-(1023/254/63)d (16777/241/1)-(47373/165/40)r hex: 00 FE FF FF 07 FE FF FF 98 D1 10 10 F8 FF 4B 1DOffset 131610mb works out to block 269,537,280. I am never sure whether they start number at 1 or 0, so I will try 269537279, 269537280 and 269537281. I tried all three but they didn't work. They don't look right anyway - cylinder 16777 is too different from the original fdisk output of 25497. So no luck with gpart. Let's try another tool ...
Testdisk is probably the best tool for retrieving lost partitions. It is an interactive text program. It will scan a drive for lost partitions and display the info. Here is an example after scanning my drive for the missing Win7 partition:
Hurray! Found it! The first item in the list is the Windows XP partition which I had used in my experiment and then deleted. The size of the Windows XP partition (in 512 byte sectors) corresponds to about 7GB, which is what I would expect. The second NTFS partition found has an L beside it, which means logical partition. That isn't what I would expect, but whatever, I can live with the change to logical if I get my data back. The start cylinder looks right: cylinder 25496, head 139, sector 52. The size of about 100GB looks about right too. So I select the missing partition, press Enter, and tell testdisk to write the new partition table.
testdisk says that you need to reboot to see the new partition table but that isn't really true. Just run fdisk in interactive mode and give it the write command. As long as no partitions are mounted then fdisk will synchronize the partition table with the OS. Then you can mount the partition. So fdisk redeems itself a little bit.
One more step. I turned it back into a primary partition using sfdisk. First, output the partition table to a file
bash# sfdisk -d /dev/sda > logical.sfdisk
Which looks like this:
bash# cat logical.sfdisk #partition table of /dev/sda unit: sectors /dev/sda1 : start= 63, size= 14169267, Id= 7, bootable /dev/sda2 : start=409601997, size=215535348, Id= f /dev/sda3 : start= 0, size= 0, Id= 0 /dev/sda4 : start= 0, size= 0, Id= 0 /dev/sda5 : start=409602048, size=215535297, Id= 7
So now I just edit this file down until it looks like this:
#partition table of /dev/sda unit: sectors /dev/sda1 : start= 63, size= 14169267, Id= 7, bootable /dev/sda2 : start=409602048, size=215535297, Id= 7
Basically, I just deleted the sda2, sda3, sda4 lines and changed /dev/sda5 to /dev/sda2. And this is easy to write back to the partition table using sfdisk
bash# sfdisk /dev/sda < logical.sfdisk
Woohoo! Now I can reboot my box and grub will find Win7 at /dev/sda2.
So what are my lessons learned?