How to find the NTFS filename associated with a bad block using Linux

Intro

I often see hard drives that have bad blocks. By this I mean that one or more blocks on the drive cannot be read. When the drive is in a Windows PC the computer may appear to be very slow because the operating system repeatedly attempts to read the bad block. Sometimes the operating system cannot start because the bad block is part of a critical file. It is easy to identify the bad block using the badblocks command, but badblocks doesn't indicate which file is corrupt. This tutorial explains the three steps necessary to find the NTFS Filename associated with the bad block.

Step 1: Clone the Drive

First task with any failing drive is to clone it. I prefer to clone by partition. First I check the partition table using fdisk.

255 heads, 63 sectors/track, 0 cylinders, total 2048 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xd192b742

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1            2048    31459327    15728640   27  Hidden NTFS/WinRE 
/dev/sdd2   *    31459328    31664127      102400    7 /HPFS/NTFS/exFAT 
/dev/sdd3        31664128  1250260991   609298432    7 /HPFS/NTFS/exFAT

then I clone the individual partitions using ddrescue

bash# ddrescue -d -n /dev/sdd3 sdd3.img logfile

In this case I'm just taking the image of the third partition because it contains the data (and the bad block). When I use ddrescue the target drive is usually failing. I don't want to make a failing drive worse by repeatedly banging on the broken blocks. So I specify the -n switch meaning no retries and no block splitting. When ddrescue is finished I have an image of the retrievable data in the sdd3.img file and a logfile which tells me where the bad data is located. Here's the contents of logfile:

# Rescue Logfile. Created by GNU ddrescue version 1.16
# Command line: ddrescue -d /dev/sdd3 sdd3.img logfile
# current_pos  current_status
0x02499600     +
#      pos        size  status
0x00000000  0x02499800  +
0x02499800  0x00000200  -
0x02499A00  0x9142566600  +

Error lines end with the - character. This logfile shows one badblock, it is located at 0x02499800 relative to the start of the partition and it is 512 bytes long.

Step 2: Find the Cluster

The bad block is located at 0x02499800 which is 38377472 in decimal. NTFS is organized in clusters. We need to know the cluster size. We can find it using the ntfsinfo command, like this:

bash# ntfsinfo -m -f sdd3.img
WARNING: Dirty volume mount was forced by the 'force' mount option.
Volume Information 
	Name of device: sdd3.img
	Device state: 3
	Volume Name: Gateway
	Volume State: 27
	Volume Version: 3.1
	Sector Size: 512
	Cluster Size: 4096
	Index Block Size: 4096
	Volume Size in Clusters: 24414061
MFT Information 
	MFT Record Size: 1024
	MFT Zone Multiplier: 0
	MFT Data Position: 24
	MFT Zone Start: 786432
	MFT Zone End: 3838189
	MFT Zone Position: 786432
	Current Position in First Data Zone: 3838189
	Current Position in Second Data Zone: 0
	LCN of Data Attribute for FILE_MFT: 786432
	FILE_MFTMirr Size: 4
	LCN of Data Attribute for File_MFTMirr: 2
	Size of Attribute Definition Table: 2560
FILE_Bitmap Information 
	FILE_Bitmap MFT Record Number: 6
	State of FILE_Bitmap Inode: 80
	Length of Attribute List: 0
	Attribute List: (null)
	Number of Attached Extent Inodes: 0
FILE_Bitmap Data Attribute Information
	Decompressed Runlist: not done yet
	Base Inode: 6
	Attribute Types: not done yet
	Attribute Name Length: 0
	Attribute State: 3
	Attribute Allocated Size: 3055616
	Attribute Data Size: 3051760
	Attribute Initialized Size: 3051760
	Attribute Compressed Size: 0
	Compression Block Size: 0
	Compression Block Size Bits: 0
	Compression Block Clusters: 0

So the cluster size is 4096. Then the cluster containing the bad block is 38377472 / 4096 = 9369.

Sometimes I make an image of the whole drive, not just the ntfs partition. In these cases, I need to use losetup to create a loop back device of the ntfs partition, then use ntfsinfo on that loopback device. Here are the relevant commands.

First find the offset using a calculator. For instance, if the starting block of sdd3 is 31664128 and the block size is 512 bytes, then the offset is 16212033536.

bash# losetup -o 16212033536 /dev/loop5 sdd.img
bash# ntfsinfo -m -f /dev/loop5

... ntfsinfo output ...

bash# losetup -d /dev/loop5

Step 3: Find the Filename

The last step is to find which file (if any) uses cluster 9369. We can do that using the ntfscluster command, like this:

bash# ntfscluster -f -c 9369 sdd3.img 2>> /dev/null
Searching for cluster 9369
Inode 89381 /Windows/System32/atidxx64.dll/$DATA

So the problem file in this case is the ATI directx driver, C:\Windows\System32\atidxx64.dll. In this case, I was able to mount the partition, delete the file, force a write to the bad block and then the drive passed its long SMART selftest.

It should also be pretty easy to adapt this procedure into a short script which will process a longer ddrescue log file and give a list of NTFS file names.

Addendum: Alternate Step 2 using LBA from SMART

ddrescue is slow. i usually let it run overnight. rsync would be much faster. sometimes i clone using ntfsclone because its faster but not as useful for file retrieval and ntfsclone sometimes dies when it hits a read error.

the other issue i wanted to address but didn't get to last night is using LBA number from SMART to find the ntfs filename. this would be faster than ddrescue or rsync procedures (but won't give me a clone file). in this case smartctl command gives us an LBA number, such as 0x01e44ccc (31739084 in base 10), which is the LBA number offset from the beginning of the disk using LBA 0 as the first block.

if sdd3 partition starts at LBA 31664128 (see fdisk output in step 1) then the LBA block number offset from beginning of partition is 31739084 - 31664128 = 74956. With an NTFS cluster size of 4096 and an LBA size of 512, there are 8 LBAs per cluster, so cluster number = 74956/8 = 9369. Which is the same cluster number we found in step 2 using ddrescue's logfile. Then proceed with step 3.

For scripting, smartctl output would be a little harder to parse than ddrescue log file, but fifteen or twenty minutes tuning a sed filter would probably give me what I want.

Scripts to process large log files

If the logfile is very large then it might be useful to have a script that converts the logfile to a list of damaged files. I made two scripts. The first script takes a logfile and generates a results file. The results file includes all the standard out from ntfscluster for every bad block. The second file takes this results file and generates a non-repeating list of damaged files.

Script to convert logfile to results file


#!/bin/bash

# If the ddrescue is of an entire disk then the script
# needs to calculate an offset (in bytes) in order to convert
# the position recorded in the rescue log to a position within
# the ntfs partition.
# 
# BEGINNINGOFPARTITION is the first block of the partition from gdisk or fdisk
# PARTITIONBLOCKSIZE is the block size from gdisk or fdisk
# CLUSTERSIZE is almost always 4096 but you should check just in case

RECOVERYLOG="recovery.log"
BEGINNINGOFPARTITION="1026048"
PARTITIONBLOCKSIZE="512"
PARTITIONOFFSET=$(("$BEGINNINGOFPARTITION"*"$PARTITIONBLOCKSIZE"))
RESULTSFILENAME="results.txt"
NTFSTARGET="/dev/sda2"
CLUSTERSIZE="4096"

# Reset results file

echo "Beginning of Partition: $BEGINNINGOFPARTITION" > "$RESULTSFILENAME"
echo "Partition block size:   $PARTITIONBLOCKSIZE" >> "$RESULTSFILENAME"
echo "Partition offset:       $PARTITIONOFFSET" >> "$RESULTSFILENAME"
echo "NTFS Target:            $NTFSTARGET" >> "$RESULTSFILENAME"
echo "Cluster Size:           $CLUSTERSIZE" >> "$RESULTSFILENAME"
echo "####" >> "$RESULTSFILENAME"


# read recovery log line by line

FOUNDFIRSTLINE="no"
while IFS= read -r line; do

# read comment lines at beginning of file until
# until find a comment line that matches the header of the error log
# if it doesn't find the first line then it will read to the end of the file
    if [ "$FOUNDFIRSTLINE" == "no" ] ; then
        if [ "$line" == "#      pos        size  status" ] ; then
	    echo "found start of error log"
	    FOUNDFIRSTLINE="YES"
	fi
	continue
    fi

# we have found the beginning of the error log
# read each line
# each line has three fields: address (in hex), size (in hex), and status
# length of address field may vary, each field separated by two spaces.
# For  instance:
EXAMPLE="
0x00000000   0xA1B65E000  +
0xA1B65E000  0x00000200  -
0xA1B65E200  0x00000C00  /
0xA1B65EE00  0x00000200  -
0xA1B65F000  0x00007000  +
0xA1B666000  0x00000200  -
0xA1B666200  0x00000C00  /
0xA1B666E00  0x00000200  -
0xA1B667000  0x000A7000  +"
# status + means good block
# status - means failed block bad sectors
# status / means failed block non scraped

    read -ra arr -d '' <<<"$line"
    POSITION="${arr[0]}"
    SIZE="${arr[1]}"
    STATUS="${arr[2]}"
    echo "Text read from file: $line"
    echo "Position: $POSITION   Size: $SIZE   Status: $STATUS" 

    if [ "$STATUS" == "-" ] ; then
	# trim first two characters and convert to decimal
	POSITION="${POSITION:2}" ; POSITION=$((16#$POSITION))
	SIZE="${SIZE:2}" ; SIZE=$((16#$SIZE))
	POSITION=$(("$POSITION"-"$PARTITIONOFFSET"))
	CLUSTER=$(("$POSITION"/"$CLUSTERSIZE")) 
	echo "Position: $POSITION Size: $SIZE" Cluster: $CLUSTER  >> "$RESULTSFILENAME"
	ntfscluster -f -c "$CLUSTER" "$NTFSTARGET" 1>> "$RESULTSFILENAME" 2>> /dev/null
    fi	

done < "$RECOVERYLOG"

Script to Generate list of names from results file

#!/bin/bash
#
echo '' > names.txt
while IFS= read -r line; do
    if [ "${line:0:5}" == "Inode" ] ; then
	# trim $data from end if it is there
	if [[ "$line" =~ .*"\$DATA"$ ]] ; then
            line=${line%??????}
	fi
    	echo "$line" | cut -f 3- -d ' ' >> names.txt
    fi
done < results.txt
sort names.txt > names2.txt
uniq names2.txt > names.txt
rm names2.txt
cat names.txt