Data Set for Photorec Testing .jpg .pdf .zip etc.

Using PhotoRec to recover lost data
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Message
Author
SHaran
Posts: 17
Joined: 15 May 2015, 05:17

Data Set for Photorec Testing .jpg .pdf .zip etc.

#1 Post by SHaran »

For research purposes I am wondering if there exists a downloadable collection of files where each file represents a unique file format? So that all the different file formats supported by Photorec can be tested.

recuperation
Posts: 2719
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#2 Post by recuperation »

How would you test Photorec with some kind of dataset knowing that the recovery success depends heavily on the degree of on-disk defragmentation?

SHaran
Posts: 17
Joined: 15 May 2015, 05:17

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#3 Post by SHaran »

The defragmentation issue is one of the things I want to test for to see how Photorec operates on different filesystem types.

I'm also working on a script that runs Photorec on a per file basis to see if Photorec can extract an exact match to the source file. And the reason for this is to use Photorec as a kind of file verification tool to detect possible file corruption at the filesystem level. So a dataset of all file types would be helpful.

recuperation
Posts: 2719
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#4 Post by recuperation »

If just a data bit of a file flips on your drive you won't be able to diagnose that without having a copy or checksums.
Photorec cannot help in this case.

SHaran
Posts: 17
Joined: 15 May 2015, 05:17

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#5 Post by SHaran »

Yes false positives are possible where a souce file could still be carved and also be corrupt. But if Photorec can not carve the source file then we can say with certainty the file is corrupt.

A tool like this can be useful to weed out corrupt files after using data recovery tools that use file system reconstruction techniques.

recuperation
Posts: 2719
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#6 Post by recuperation »

SHaran wrote: 18 Jul 2019, 17:55 But if Photorec can not carve the source file then we can say with certainty the file is corrupt.
No. You just have to delete the FAT on a defragmented FAT drive.
All data remains intact. Photorec finds the first cluster but will fail on determining the following ones.
The file is perfectly OK.
The metadata is gone.
Photorec must fail when hitting defragmentation - excluding the obvious case that it determines that the following cluster is the beginning of another file but even this cluster could be an inner one - it is always an educated guess.

If the metadata is still there, Photorec won't give you any additional advantage.

SHaran
Posts: 17
Joined: 15 May 2015, 05:17

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#7 Post by SHaran »

Yes I understand what you are saying but fragmentation is not in play if you are running Photorec at the filesystem level against a single file. If Photorec can not carve an exact copy of the source file then the source file is corrupt. So for example if Photorec is run against HelloWorld.jpg but Photorec can not carve out a recup_dir.1/f00000001.jpg that is an exact copy matching HelloWorld.jpg then the file is corrupt.

recuperation
Posts: 2719
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#8 Post by recuperation »

SHaran wrote: 18 Jul 2019, 22:13 if you are running Photorec at the filesystem level against a single file.
There is no such thing as "running Photorec at the filesystem level against a single file".
That would not make sens at all. If the file system is working there is no need for Photorec.
Photorec operates on sector level.

Read here:

https://www.cgsecurity.org/wiki/PhotoRe ... oRec_works

SHaran
Posts: 17
Joined: 15 May 2015, 05:17

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#9 Post by SHaran »

In Linux as they say everything is a file. And Photorec with the /help option says it will work on a file or a device. For example it is common to use Photorec on device image files created via dd or ddrescue.

Code: Select all

# photorec /h
PhotoRec 7.0, Data Recovery Utility, April 2015
Christophe GRENIER <grenier@cgsecurity.org>
http://www.cgsecurity.org

Usage: photorec [/log] [/debug] [/d recup_dir] [file.dd|file.e01|device]
       photorec /version

/log          : create a photorec.log file
/debug        : add debug information

PhotoRec searches various file formats (JPEG, Office...), it stores them
in recup_dir directory.
Here is an example of Photorec successfully verifying the picture of the week from the Hubble Space Telescope. So we can say potw1928a.jpg is likely not corrupt. Now if Photorec had failed to produce a matching f0000000.jpg then we can say the file is corrupt.

Code: Select all

 # wget -q https://cdn.spacetelescope.org/archives/images/screen640/potw1928a.jpg
 # photorec /d /tmp/  /cmd potw1928a.jpg search
 # ls -l /tmp/recup_dir.1
/tmp/recup_dir.1:
total 56
-rw-r--r-- 1 root root 52547 Jul 19 14:29 f0000000.jpg
-rw-r--r-- 1 root root  1539 Jul 19 14:29 report.xml
# md5sum potw1928a.jpg
410309e83d269275faad0fd5b110170a  potw1928a.jpg
# md5sum /tmp/recup_dir.1/f0000000.jpg
410309e83d269275faad0fd5b110170a  /tmp/recup_dir.1/f0000000.jpg

recuperation
Posts: 2719
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

#10 Post by recuperation »

Now flip a bit in here:
cdn.spacetelescope.org/archives/images/screen640/potw1928a.jpg

How would that affect the outcome of photorec?

Locked