fidentify 7.1 not working if .photorec.sig has offsets other than zero

Using PhotoRec to recover lost data
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Locked
Message
Author
Claudio
Posts: 3
Joined: 29 Nov 2021, 04:02

fidentify 7.1 not working if .photorec.sig has offsets other than zero

#1 Post by Claudio »

Hello all,

I'm trying to define custom signatures for Photorec and they work fine as long as the offset is set to 0. However to make the signatures more specific, for one of the file types I need to look for strings that appear further down in the file. When I specify an offset other than zero, fidentify doesn't recognize the file types anymore and reverts to the built-in signature.

This .photorec.sig works and fidentity shows the correct file type if tested with appropriate files:

Code: Select all

foamfile 0 "*/--"
obj 0 "# Wavefront OBJ file
pvsm 0 "<ParaView>"
If now I specify for all file types an offset of 1 and drop the first character from each signature, fidentify reverts to built-in signatures, i.e. the signatures are not recognized in the test files:

Code: Select all

foamfile 1 "/--"  # Recognized as java
obj 1 " Wavefront OBJ file  # Recognized as txt
pvsm 1 "ParaView>"  # Recognized as txt
I've tried the following, without success:
  • different values of obvious offsets, none of which works
  • writing the offset in hex form, e.g. 0x1, not working
  • searching this forum for similar issues, nothing helpful found
  • studying the source code in file_sig.c but can't quite figure out how this works and how it is used by fidentify.c
Does the offset need to be specified in decimal or hexadecimal? Is it in Bytes?

I'm using testdisk 7.1 on macOS High Sierra 10.13.6.

I would be grateful for any suggestions on how to make this work, thanks!

Cheers, Claudio

recuperation
Posts: 2720
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: fidentify 7.1 not working if .photorec.sig has offsets other than zero

#2 Post by recuperation »

Claudio wrote: 03 Dec 2021, 18:01 Hello all,

I'm trying to define custom signatures for Photorec and they work fine as long as the offset is set to 0. However to make the signatures more specific, for one of the file types I need to look for strings that appear further down in the file. When I specify an offset other than zero, fidentify doesn't recognize the file types anymore and reverts to the built-in signature.

This .photorec.sig works and fidentity shows the correct file type if tested with appropriate files:

Code: Select all

foamfile 0 "*/--"
obj 0 "# Wavefront OBJ file
pvsm 0 "<ParaView>"
Where is the string limiter at the end for your "obj"-file? :roll:

If now I specify for all file types an offset of 1 and drop the first character from each signature, fidentify reverts to built-in signatures, i.e. the signatures are not recognized in the test files:

Code: Select all

foamfile 1 "/--"  # Recognized as java
obj 1 " Wavefront OBJ file  # Recognized as txt
pvsm 1 "ParaView>"  # Recognized as txt
Where is the string limiter at the end for your "obj"-file? :roll:
I don't get what you want to do. You are shortening an already short signature. What's the use of doing that?

I've tried the following, without success:
  • different values of obvious offsets, none of which works
  • writing the offset in hex form, e.g. 0x1, not working
  • searching this forum for similar issues, nothing helpful found
  • studying the source code in file_sig.c but can't quite figure out how this works and how it is used by fidentify.c
Does the offset need to be specified in decimal or hexadecimal? Is it in Bytes?
The manual does not tell. I don't know either. You can try that out with a fake file that starts like "123456789Claudio is looking for a definition".
Now we call files with the signature "Claudio" at position 10 "clo"- files
clo 10 "Claudio"
and check out if
clo 0xa "Claudio"
makes a difference. The latter is position 10 in hexadecimal language.

Please read the relevant chapter in the manual (not part of your list).
I just cross-read it to be able to answer you.
It states: "space or comma delimiters are ignored" with regard to the magic value. Look into your Wavefront Object File definition :!:

Do some testing!
Get yourself an USB stick with no valuable information on it. Have your software write a genuine "Wavefront Object File definition" file onto the stick.
Use any hexeditor, for instance HxD under Windows to examine the beginning of your file where the signature resides. Does it correspond to your definition?
You might zero out the first sector of the USB stick.

As your file might fit the definition of a text file as well and as I don't know how Photorec handles these definition conflicts, switch off txt and java file recognition in Photorec and see what that gives!

Claudio
Posts: 3
Joined: 29 Nov 2021, 04:02

Re: fidentify 7.1 not working if .photorec.sig has offsets other than zero

#3 Post by Claudio »

Hello @recuperation and thank you for your quick reply.

Yes, I accidentally forgot the " at the end of the second line, since I re-typed the code which is on another machine. My apologies.
Yes, I did read the relevant section of the testdisk manual, several times actually. I didn't mention it because to me it seems the obvious thing to do. However it didn't help because the examples in the manual all use an offset of zero.
Please read the relevant chapter in the manual (not part of your list).
I just cross-read it to be able to answer you.
It states: "space or comma delimiters are ignored" with regard to the magic value. Look into your Wavefront Object File definition :!:
If you look at the manual, it shows 3 methods to define the same signature. If space characters are enclosed in double quotes, they are not ignored. As a matter of fact, my signatures are recognized properly although some contain spaces, as long as they don't have an offset specified.

Two of the three examples don't actually need an offset to be recognized as the correct file type. Yet I left them in the example just to show that even with simple signatures the offset is not working. I didn't expect the purpose of me needing an offset in the first place to be questioned and I didn't want to make it more complicated than it needs to be. Nevertheless, if you really must know here it is:

Code: Select all

/*--------------------------------*- C++ -*----------------------------------*\
  =========                 |
  \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox
   \\    /   O peration     | Website:  https://openfoam.org
    \\  /    A nd           | Version:  8
     \\/     M anipulation  |
\*---------------------------------------------------------------------------*/
[b]FoamFile[/b]
{
    version     2.0;
    format      ascii;
    class       dictionary;
    object      blockMeshDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
This is a typical header of the file I need to recover, which comes in many different flavours and so the obvious thing to look for is "FoamFile" in bold on line 8. What mostly changes is the version number, so the number of bytes (offset) remains the same. So to avoid having to make as many signatures as there are version numbers, I felt using the offset to be the better way. The source code also sets the maximum length of the signature to 512 (presumably bytes), which means that if you are looking for something after byte 512, there's no other way than using the offset. My question remains, how can I make it work?
You can try that out with a fake file that starts like "123456789Claudio is looking for a definition".
Now we call files with the signature "Claudio" at position 10 "clo"- files
clo 10 "Claudio"
and check out if
clo 0xa "Claudio"
makes a difference. The latter is position 10 in hexadecimal language.
Your example works, because it has only one line without line breaks (except that the offset has to be 9, not 10). However, if I add a second line to the test file, it stops working if it exceeds a length of 73 Bytes… I therefore did some experimenting and observed some strange behaviour when there's a line break in the test file. That's why I suspect my original signatures with offset above don't work, because the test files have all line breaks. I've made a simple signature like this:

Code: Select all

fooOffset1 1 "1234567"
and test files with a number of 01234567 sequences. See below the results:

Code: Select all

# Test file 1: This works, fidentify recognizes file as fooOffset1
01234567
01234567

# Test file 2: This works
0123456701234567
01234567

# Test file 3: This works too
01234567…01234567  # 11 sequences
01234567

# Test file 4: This doesn't work, fidentify recognizes test file as txt
01234567…01234567  # 12 sequences
01234567

# Test file 5: It works up to 11 sequences + 0 on the first line
01234567…012345670  # 11 sequences + 0 = 88 + 1 Byte + line break
01234567

# Test file 6: Not working with 88 + 2 Bytes on first line
01234567…0123456701  # 11 sequences + 01 = 88 + 2 Bytes + line break
01234567

# Test file 7: Inserted line break after first sequence, still not working
01234567
01234567…0123456701  # 10 sequences + 01 = 80 + 2 Bytes + line break
01234567

# Test file 8: Remove first line and it works again, although second line has same length as in test file 7
01234567…0123456701  # 10 sequences + 01 = 80 + 2 Bytes + line break
01234567

# Test file 9: Start from Test file 6 again and insert a line break after each sequence, not working
01234567
…
01234567  # 11 sequences up to here
01
01234567
I'm trying to see a pattern here but I'm confused. Why would test file 1 work and 9 not, even though the first line looks exactly the same, including the line break? Also, why does test file 7 work with even 20 sequences on the first line if I remove the offset like this:

Code: Select all

fooOffset0 0 "0123456"
I suspect there's a bug here. If I understand correctly, if fidentify finds the signature as specified within the first characters of the file, then the length of the lines or how many lines come after that shouldn't matter. However my Test files show otherwise and I have been able to reproduce the above on two different machines, with same testdisk version and same macOS version.

I could try your suggestion of turning off txt file recognition, however I also have text files to recover which I need. Thus I would rather understand why specifying an offset isn't working properly.

User avatar
cgrenier
Site Admin
Posts: 5432
Joined: 18 Feb 2012, 15:08
Location: Le Perreux Sur Marne, France
Contact:

Re: fidentify 7.1 not working if .photorec.sig has offsets other than zero

#4 Post by cgrenier »

Signatures with a lower offset are checked before signatures with a bigger offset.
So txt signature may be evaluated before your custom signature.
If the word "OpenFOAM" is part of the format, check all the recovered files for this word:

Code: Select all

grep -rl OpenFOAM recup_dir.*

Claudio
Posts: 3
Joined: 29 Nov 2021, 04:02

Re: fidentify 7.1 not working if .photorec.sig has offsets other than zero

#5 Post by Claudio »

Thank you for your replies. Although I still don't understand why offsets do work sometimes and sometimes not, I was able to recover the files by specifying signatures without offsets. It didn't work for all the files types and I'll try to refine the signatures and give it another go.

What sets your software apart from other recovery tools I've tried is the ability to make custom signatures, which I highly appreciate. I'm grateful that photorec exists and as a token of my gratitude I've made a donation.

Thanks and I wish you happy holidays, Claudio

recuperation
Posts: 2720
Joined: 04 Jan 2019, 09:48
Location: Hannover, Deutschland (Germany, Allemagne)

Re: fidentify 7.1 not working if .photorec.sig has offsets other than zero

#6 Post by recuperation »

Lower offsets are worked on first.
If your files match another file type that comes with an offset of zero, the other file type will be assigned due to the lower offset number.
Typically file types are mostly mutually exclusive so that such kind of collision does not happen.

Locked