Page 2 of 2

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

Posted: 19 Jul 2019, 18:45
by SHaran
It depends what bit you flip. If you flip a bit that contains the file type signature then Photorec verification will fail.

In a data recovery scenario where you are trying to verify files that were generated by filesystem reconstruction tools. You can open each file manually to verify if it is good or not. But that is very time consuming and inefficient if there are thousands of files to verify.

Using Photorec to automate the file verification process is not a perfect solution but it can be a useful tool to flag massively corrupt files which you sometimes see when filesystem reconstruction goes wrong and creates a file with the original file name and size but fills the file with wrong/corrupt data.

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

Posted: 19 Jul 2019, 21:08
by recuperation
SHaran wrote: 19 Jul 2019, 18:45 It depends what bit you flip. If you flip a bit that contains the file type signature then Photorec verification will fail.
So, obviously Photorec cannot protect you against data corruption. Copies and checksum will help or at least indicate modification.

In a data recovery scenario where you are trying to verify files that were generated by filesystem reconstruction tools. You can open each file manually to verify if it is good or not. But that is very time consuming and inefficient if there are thousands of files to verify.
This is a whole different story compared to what you said initially and I agree totally to that point of view.
Using Photorec to automate the file verification process is not a perfect solution but it can be a useful tool to flag massively corrupt files which you sometimes see when filesystem reconstruction goes wrong and creates a file with the original file name and size but fills the file with wrong/corrupt data.
The way to sort out failed reconstructed files is either

1. To open them in an automated way by the programm that created them and catch any error message somehow
2. To program an integrity test based on the file specification for each type of file

And even that may fail. I had the case where jpeg-files where partly correct. The lower half of the picture was destroyed.
These files can be opened up with a couple of programs not claiming any irregularities. Technically they were OK. Looking at them you knew they were broken.

There is an obvious exception CG wrote about:
[...
Some files, such as *.MP3 types, are data streams. In this case, PhotoRec parses the recovered data, then stops the recovery when the stream ends.
...]
Christophe Grenier applies additional information from inside the mp3-file to get better results.
You can do that with a lot of file types but that requires time-consuming individual programming for each type.

Re: Data Set for Photorec Testing .jpg .pdf .zip etc.

Posted: 19 Jul 2019, 21:26
by SHaran
Thank you for the suggestions. I think we can agree that automated file verification is a complex process with so many different file types to consider. And it appears that Photorec can be at least a part of the solution.