When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

Using PhotoRec to recover lost data
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Locked
Message
Author
nyanpasu64
Posts: 4
Joined: 10 Jan 2022, 05:25

When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#1 Post by nyanpasu64 »

I somehow "corrupted" (it's unreadable on Windows, but I later discovered it mounts fine on Linux) a FAT32 filesystem on an old hard drive. When trying to recover it in TestDisk, I got the message "No file found, filesystem may be damaged". Looking online I found viewtopic.php?t=3052:
Try PhotoRec. In Options enable the expert mode, start a recovery on the Whole space of the FAT32 partition,
when asked, tell PhotoRec to try the unformat method. You may be able to recover your files with the original filenames.
I tried running it on my disk, but with 7.1 on Windows, 7.1 on Arch Linux, 7.2 WIP, and git, every time it prints "FAT filesystem was beginning before the actual partition.", and when I confirm Ok, it hangs burning a CPU core and not performing any disk IO at all (according to Windows taskmgr and Arch Linux iotop):

Code: Select all

PhotoRec 7.2-WIP, Data Recovery Utility, May 2021
Christophe GRENIER <grenier@cgsecurity.org>
https://www.cgsecurity.org

Disk /dev/sdc - 500 GB / 465 GiB (RO) - Seagate FreeAgent GoFlex
     Partition                  Start        End    Size in sectors
 1 P FAT32                    0  32 33 12959 179 20  208195584

...
  Stop  
I couldn't debug this on Windows, but on Arch Linux, I managed to build PhotoRec with debug symbols enabled, and trace it in gdb. I ran PhotoRec to the "FAT filesystem was beginning before the actual partition." message, then attached gdb and added a breakpoint at photorec_find_blocksize with stack trace:

Code: Select all

#0  photorec_find_blocksize (params=params@entry=0x7ffca2a4d3d0, options=options@entry=0x7ffca2a4d3b0, list_search_space=list_search_space@entry=0x55acb525a300 <list_search_space>) at phbs.c:74
#1  0x000055acb5225b7e in photorec (params=params@entry=0x7ffca2a4d3d0, options=options@entry=0x7ffca2a4d3b0, list_search_space=list_search_space@entry=0x55acb525a300 <list_search_space>) at phrecn.c:338
#2  0x000055acb5226f9a in menu_photorec (params=params@entry=0x7ffca2a4d3d0, options=options@entry=0x7ffca2a4d3b0, list_search_space=list_search_space@entry=0x55acb525a300 <list_search_space>) at ppartseln.c:288
#3  0x000055acb522265d in photorec_disk_selection_ncurses (list_search_space=0x55acb525a300 <list_search_space>, list_disk=0x55acb6b0c9a0, options=0x7ffca2a4d3b0, params=0x7ffca2a4d3d0) at pdiskseln.c:252
#4  do_curses_photorec (params=params@entry=0x7ffca2a4d3d0, options=options@entry=0x7ffca2a4d3b0, list_disk=list_disk@entry=0x55acb6b0c9a0) at pdiskseln.c:348
#5  0x000055acb51cd3a0 in main (argc=2, argv=0x7ffca2a4d558) at phmain.c:393
Afterwards, I added a breakpoint on line 103 (the beginning of the endless while loop). Before the first loop iteration, list_search_space was (alloc_data_t *) 0x55acb525a300, and current_search_space was (alloc_data_t *) 0x55acb6bfc410. I typed continue, which broke on line 103 again, and current_search_space was still (alloc_data_t *) 0x55acb6bfc410 (unchanged!). If I delete all breakpoints and type finish, the function never returns.

I'm not sure how to fix this bug, and didn't debug further than that. I could run commands or supply partial disk images if needed.

----

EDIT: I found what was wrong with my disk: https://bugzilla.gnome.org/show_bug.cgi?id=759916#c21:
In case it can help someone, here's the oneliner I've used to fix the
broken FS on a USB key, based on the comments above:

Code: Select all

$ echo -ne '\xeb\x58\x90' | sudo dd conv=notrunc bs=1 count=3
of=/dev/sdb1
Be careful to target the right partition (/dev/sdb1 in my case,
probably something else in yours) and to first test the command line
on a text file to make sure the hexadecimal is properly interpreted by
your shell (the above works in ZSH, but with other shell, you might
have to double the backslashes).
I ran this command in bash (not zsh), and it fixed a partition on my hard drive with the same issue (previously, Windows and testdisk couldn't recognize the filesystem on the partition). I didn't test on fish though.

I still think Linux's fsck.vfat needs to be changed to recognize and fix this error, and testdisk should ideally recognize this type of corrupted disk and restore it for you, and photorec shouldn't enter an infinite loop when trying to unformat this type of broken partition.

User avatar
cgrenier
Site Admin
Posts: 5432
Joined: 18 Feb 2012, 15:08
Location: Le Perreux Sur Marne, France
Contact:

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#2 Post by cgrenier »

As the FAT boot sector was corrupted, the blocksize was not read from the boot sector, photorec_find_blocksize() was called instead.
photorec_find_blocksize() searches 10 known file header and use their location to guess the blocksize.
Locating 10 files should not be very long unless there are read errors or there was (almost) no file.

It should have been possible to fix the boot sector using TestDisk Advanced, Boot, RebuildBS, List... RebuildBS also try to find the block size, so it may encounter the same problem...

nyanpasu64
Posts: 4
Joined: 10 Jan 2022, 05:25

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#3 Post by nyanpasu64 »

cgrenier wrote: 14 Jan 2022, 21:54 photorec_find_blocksize() searches 10 known file header and use their location to guess the blocksize.
Locating 10 files should not be very long unless there are read errors or there was (almost) no file.
From what I could tell, photorec_find_blocksize() seemed to enter an infinite loop where current_search_space was the same on every iteration. If it actually *was* scanning different files on every iteration, possibly it failed to find files because my disk contained large binary files rather than photos. If not, then that indicates a bug where photorec_find_blocksize() entered an infinite loop. How can I tell?

recuperation
Posts: 2735
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#4 Post by recuperation »

nyanpasu64 wrote: 15 Jan 2022, 11:41
cgrenier wrote: 14 Jan 2022, 21:54 photorec_find_blocksize() searches 10 known file header and use their location to guess the blocksize.
Locating 10 files should not be very long unless there are read errors or there was (almost) no file.
From what I could tell, photorec_find_blocksize() seemed to enter an infinite loop where current_search_space was the same on every iteration. If it actually *was* scanning different files on every iteration, possibly it failed to find files because my disk contained large binary files rather than photos. If not, then that indicates a bug where photorec_find_blocksize() entered an infinite loop. How can I tell?
I can't help you with your coding issue and cannot provide you with a possible "bugfix".
Please incorporate the information given to you by CGrenier into your case!

The boot sector is missing and the search for 10 known file headers may likely fail given your big binary files whose type we don' know and Testdisk maybe neither. As you are into programming and debugging, build a FAT32 boot sector manually yourself!

Here is a secondary source for its structure:
https://www.ntfs.com/fat-partition-sector.htm
https://www.ntfs.com/fat-boot-modif.htm

For comparison purposes, have a similar boot sector produced!

Duplicate your failed drive or at least the boot sector of your broke FAT32 partition.

Get yourself examples how FAT32 boot sectors look like as function of disk size:

1.
Use a small USB stick and put a FAT32 partition on it.
Duplicate that boot sector into a file.
To find the boot sector with a simple hex editor like Hxd search for fixed values in the boot sector, p.e. the jump instruction or the magic value.

2.
Get yourself another 500Gbyte drive and partition it. When windows asks for the table scheme, use MBR (not GPT). Put in a FAT32 partition and format it.
The boot sector of this file system is your best reference boot sector!
Duplicate that boot sector into a file.

Summary:
You have three boot sectors now, your broken one, the one from the USB stick and the one from your additional 500GB drive.
and compare it to the broken boot sector and the boot sector of the little USB stick.
I think Hxd has a nice comparison function.

General info:

CHS, LBA:
If I remember correctly, the CHS (cylinder head sector) figures do not matter except for booting a legacy Windows.
What matters is are LBA (logical block adressing) sector numbers.

Clusters:
FAT allocates spaces in clusters. A cluster is a series of sectors. The length is fixed within a file system and upon creation it is determined as a function of file system size.

Root directory:
To find the root directory of your broken drive either search for known parts of your file names on the broken disk or search
for the standard entries on top of the root directory which look like:
"." and
".."

Converting cluster no. into sector number no:
There is a some formula how to calculate the sector number from the cluster number in FAT32. If I remember correctly the smallest "legal" cluste is no. 2 and the location depends on the length of the structures in front of it.

You probably find enough documentation on FAT and FAT32 online, otherwise buy Brian Carrier; "File forensic analysis" p.e..
There are a couple of software vendors that sell hex editors that open up drives like Hxd does but go beyond just showing hex and ascii of sectors. They might be helpful.

Use all the information gathered to build your best boot sector, maybe it works.

Good luck!

nyanpasu64
Posts: 4
Joined: 10 Jan 2022, 05:25

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#5 Post by nyanpasu64 »

I already fixed my disk by running `echo -ne '\xeb\x58\x90' | sudo dd conv=notrunc bs=1 count=3 of=/dev/sdb1`, so I no longer need disk repair advice. Instead I'm reporting a bug where PhotoRec enters an infinite loop when unformatting this type of disk.

To test (and IMO disprove) your theory that PhotoRec is searching for "10 known file headers" and will exit the infinite loop once it found them, I took another random flash drive, copied around 20 photos onto it, replaced the boot sector's first 3 bytes with 00 00 00 (so only Linux can mount it), then ran TestDisk (which printed "Invalid FAT boot sector") and PhotoRec (which when unformatting, printed "FAT filesystem was beginning before the actual partition." and then entered an infinite loop where iotop says it performs no IO whatsoever). I even unplugged the disk entirely, and PhotoRec doesn't respond to me pressing Enter to trigger Stop, and never returns from photorec_find_blocksize(). My guess is that your intent was "photorec_find_blocksize() searches 10 known file header and use their location to guess the blocksize", but the function is broken and has a bug causing it to never actually find files.

To create a smaller test case, I generated a 40 megabyte image, mounted as loopback using in GNOME Disks, then created a FAT16 partition by mistake, causing the partition to begin with EB 3C 90 (not 58 90). After I loaded 12 PNG files and zeroed out the first 3 bytes of the partition, TestDisk says "Invalid FAT boot sector", and PhotoRec says "Can't find FAT cluster size" and enters photorec_find_blocksize. I let it run for a few minutes without making progress (it doesn't respond to pressing Enter to trigger the Stop button either). This indicates to me the function has a bug and enters an infinite loop; if PhotoRec was instead making forward progress looking for file headers, it would've finished quickly, since reading the entire 40MB disk image using `cat` takes less than a second.

I then deleted and recreated the disk image, and created a partition and formatted it as FAT32 this time. It begins with EB 58 90, but PhotoRec behaves like my FAT16 image (prints "Can't find FAT cluster size" unlike my FAT32 physical drives, then gets stuck like both my physical drives and artificial disk images).

Unlike my real-world flash drives, these files are small enough that I can upload and share them: https://cdn.discordapp.com/attachments/ ... k-fat16.7z, https://cdn.discordapp.com/attachments/ ... k-fat32.7z

In addition to fixing how PhotoRec processes these disk images, can TestDisk recognize and repair this specific type of disk corruption, by replacing the first 3 bytes on the partition with EB 58 90 (FAT32) or EB 3C 90 (FAT16), if the remainder of the boot sector is valid?

recuperation
Posts: 2735
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#6 Post by recuperation »

nyanpasu64 wrote: 16 Jan 2022, 07:57 I already fixed my disk by running `echo -ne '\xeb\x58\x90' | sudo dd conv=notrunc bs=1 count=3 of=/dev/sdb1`, so I no longer need disk repair advice. Instead I'm reporting a bug where PhotoRec enters an infinite loop when unformatting this type of disk.
Got it.
In addition to fixing how PhotoRec processes these disk images, can TestDisk recognize and repair this specific type of disk corruption, by replacing the first 3 bytes on the partition with EB 58 90 (FAT32) or EB 3C 90 (FAT16), if the remainder of the boot sector is valid?
Testdisk has a function to rebuild the boot sector. Could you try this function using your test image partitions?

nyanpasu64
Posts: 4
Joined: 10 Jan 2022, 05:25

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#7 Post by nyanpasu64 »

recuperation wrote: 16 Jan 2022, 10:20 Testdisk has a function to rebuild the boot sector. Could you try this function using your test image partitions?
(tl;dr skip to the bottom.)

Do you mean [Rebuild BS]? Putting it in a Boot menu isn't very discoverable (I've only used Analyse and Advanced -> Undelete before, not Advanced -> Boot), and the documentation confused me when I initially read it.

https://www.cgsecurity.org/wiki/TestDisk_Step_By_Step advises you to use "Analyse", which doesn't work in my case (as I describe below).

https://www.cgsecurity.org/wiki/Advanced_FAT_Repair: All the way at the top, it says "In the Advanced menu, select the partition you want to modify and choose Boot", but I didn't fully read this part prior to the first heading. However, "Repair a FAT boot sector" shows the screen if you select the Advanced menu and *don't* choose Boot. I skipped down to "Rebuild a valid FAT boot sector" which shows a screen from 2005, and didn't explain how to get there.

After writing this post, I found viewtopic.php?t=9690 which explains "advanced tab->boot->rebuild bs". I had opened the Advanced tab before, but didn't look in "Boot" because I wasn't trying to fix an OS boot process (the filesystem isn't bootable), but fix a filesystem's metadata.

Earlier, prior to discovering "Boot":

When analyzing photorec-stuck-fat32.img, I can't get testdisk to show a [Rebuild BS] button. When I enter the main menu and select [ Analyse ], it shows:

Code: Select all

Disk photorec-stuck-fat32.img - 41 MB / 40 MiB - CHS 6 255 63
Current partition structure:
     Partition                  Start        End    Size in sectors

Invalid FAT boot sector
 1 P FAT32 LBA                0   0  2     5  25 20      81919
 1 P FAT32 LBA                0   0  2     5  25 20      81919

Warning: Bad ending cylinder (CHS and LBA don't match)
No partition is bootable
When I perform a [Quick Search] or [Deeper Search], no partitions are found.

In simulated corruption (intact backup BS), Backup BS works and Rebuild BS mostly works

I took an old physical 512MB flash drive, recreated the MBR partition table, created a FAT32 partition formatted as FAT32, and erased the filesystem boot sector's jump instruction. I left the backup boot sector intact. Note that this was *not* the case in the original corruption produced by libparted, where both boot sectors were identically corrupted (fsck found that the only difference between them was the dirty bit).

Here, "Quick Search" doesn't find anything, but "Deeper Search" still finds the intact backup boot sector and allows me to edit the partition. But how do I rebuild the boot sector?

The wiki page at https://www.cgsecurity.org/wiki/Advance ... oot_sector shows a screen from 2005 and didn't explain *how* to reach the [Rebuild BS] screen. I had to look around further (search testdisk "Rebuild BS") to find a questionable tutorial at https://us.informatiweb.net/tutorials/i ... ition.html, which told me to select [ Write ] in order to choose between [Backup BS] or [Rebuild BS].

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62
     Partition                  Start        End    Size in sectors
 1 * FAT32                    2   1  3  1003   5 42     993280 [NO_LABEL]

Boot sector
Bad

Backup boot sector
OK

First sectors (boot code and partition information) are not identical.

A valid FAT Boot sector must be present in order to access
any data; even if the partition is not bootable.
[Backup BS] fixes the issue.

On an empty filesystem, [Rebuild BS] was slow. If you cancel it immediately, it writes a FAT16 boot sector onto a FAT32 partition. If you let it run, it asks whether to include a cluster filled with gibberish filenames (I aborted interactive mode), and ends up generating a bad boot sector:

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62
     Partition                  Start        End    Size in sectors
 1 * FAT32                    2   1  3  1003   5 42     993280 [NO_LABEL]

FAT : 32
cluster_size 8 8
reserved     32 32
total_sect   993280 993240
fat32_length 976 976
root_cluster 0 2
free_count   uninitialised 123906
next_free    uninitialised 2
Extrapolated boot sector and current boot sector are different.
Warning: Extrapolated boot sector have incorrect values.

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62
     Partition                  Start        End    Size in sectors
 1 * FAT32                    2   1  3  1003   5 42     993280 [NO_LABEL]

Boot sector
Bad root_cluster
Bad

Backup boot sector
Bad root_cluster
Bad

Sectors are identical.
On non-empty FAT32 filesystems, "Rebuild BS" is nearly instant and produces better results. I recreated and reformatted the partition, added some files, corrupted the boot sector, retried "Rebuild BS", and this time "Rebuild BS" seems to work better, and my files are accessible on Linux (didn't test Windows). (fsck still complains that the boot sector volume label is empty though.)

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62
     Partition                  Start        End    Size in sectors
 1 * FAT32                    2   1  3  1003   5 42     993280 [NO_LABEL]

FAT : 32
cluster_size 8 8
reserved     32 32
total_sect   993280 993240
fat32_length 976 976
root_cluster 2 2
free_count   uninitialised 94099
next_free    uninitialised 29809
Extrapolated boot sector and current boot sector are different.
In actual corruption, Analyse doesn't work and Advanced -> Boot mostly works

My original corrupted disk had identically corrupted normal and backup boot partitions; fsck verified that they were identical except for the dirty bit. To reproduce this, I corrupt the regular boot sector, then use fsck to copy the corruption over the backup boot sector as well (fsck /dev/sda1, 1, 1). When I run TestDisk, Analyse shows:

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62
Current partition structure:
     Partition                  Start        End    Size in sectors

Invalid FAT boot sector
 1 * FAT32                    2   1  3  1003   5 42     993280
 1 * FAT32                    2   1  3  1003   5 42     993280
Neither Quick Search nor Deeper Search can identify the FAT32 partition and filesystem, even though it's perfectly intact except for 6 missing bytes.

If I use the Advanced -> Boot menu instead, [Rebuild BS] behaves as described above (though I performed less testing here).

Can you manually type in the partition position? (no)

Prior to learning that Advanced->Boot is used for filesystem header recovery (rather than boot device management), I discovered that I can open the main menu and click "Analyse", write down the partition printout (above), then start and abort Quick Search, then press A to add a partition, then retype the information from *before* it tried to search for partitions (2 1 3 1003 5 42, find FAT32 in the list, type 0B, make a typo, I can't press Backspace to retype it, I get a list of partition types, I pick FAT32, but TestDisk says Unknown anyway so I activate [ Type ] a second time and type 0B correctly this time). If you instead type 0B instead correctly the first time, the next screen shows ">Unknown" but pressing Enter shows FAT32 anyway. Now TestDisk somehow thinks it's an extended partition which it isn't:

Code: Select all

Disk /dev/sda - 512 MB / 488 MiB - CHS 1009 16 62

     Partition                  Start        End    Size in sectors

 1 E extended                 2   1  2  1003  15 62     993921
 5 L FAT32                    2   1  3  1003   5 42     993280
 
At this point, I have to "Write partition table, confirm ? (Y/N)". And now I'm back in the main menu, having had no opportunity to [Rebuild BS].

Oddly when I restart testdisk, it thinks it's a primary partition again.

tl;dr
Testdisk has a function to rebuild the boot sector.

On disks corrupted by libparted, [Rebuild BS] is inaccessible through the Analyse menu, but accessible through Advanced -> partition -> Boot. It's heuristic-based (even though filesystems corrupted by libparted can be deterministically repaired), and using Rebuild BS causes further unnecessary changes to the boot sector, which could cause problems ranging from cosmetic (missing volume label) to possibly corruption (total_sect).

I'm confused by the UI and what "total_sect 993280 993240" means. If I run Rebuild BS a second time, it prints "total_sect 993280" instead. Does that mean that the first Rebuild BS wrote 993280 replacing 993240?

recuperation
Posts: 2735
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: When trying to recover FAT32(?) filesystem, photorec hangs in photorec_find_blocksize()

#8 Post by recuperation »

nyanpasu64 wrote: 21 Jan 2022, 14:23 I'm confused by the UI and what "total_sect 993280 993240" means. If I run Rebuild BS a second time, it prints "total_sect 993280" instead. Does that mean that the first Rebuild BS wrote 993280 replacing 993240?
Thank you for this clean, well-formated and precise posting without abbreviations!
You probably know more about Testdisk than I do right now! :geek:

I have no answer for you!
I can only tell you what I would do to find out without having to read the C code (a language I have no programming experience in).

You can calculate a total sector count based upon the information in the partition table for your partition in question.
There is that total sector field in the FAT32 boot sector. That should equal the implicit number from the partition table I think (I am not sure).
If the boot sector figure for the number of total sectors is smaller than the one from the partition table that would not do harm.

Try to reproduce the situation again!
1. Backup the boot sector and the partition table information for comparison purposes.
2. Use Testdisk. Apply write as you did before and
3. Backup the modified? boot sector and the partition table information again.
4. Compare the information in 1. and 3. What is the difference?

5. Find out the position of the backup boot sector and calculate the distance to the boot sector. It could be that the backup is supposed to be the last information in the partition. If that is true, this information would as well determine the partition size of the partition in question.

Locked