Hello @recuperation and thank you for your quick reply.
Yes, I accidentally forgot the " at the end of the second line, since I re-typed the code which is on another machine. My apologies.
Yes, I did read the relevant section of the testdisk manual, several times actually. I didn't mention it because to me it seems the obvious thing to do. However it didn't help because the examples in the manual all use an offset of zero.
Please read the relevant chapter in the manual (not part of your list).
I just cross-read it to be able to answer you.
It states: "space or comma delimiters are ignored" with regard to the magic value. Look into your Wavefront Object File definition
If you look at the manual, it shows 3 methods to define the same signature. If space characters are enclosed in double quotes, they are not ignored. As a matter of fact, my signatures are recognized properly although some contain spaces,
as long as they don't have an offset specified.
Two of the three examples don't actually need an offset to be recognized as the correct file type. Yet I left them in the example just to show that even with simple signatures the offset is not working. I didn't expect the purpose of me needing an offset in the first place to be questioned and I didn't want to make it more complicated than it needs to be. Nevertheless, if you really must know here it is:
Code: Select all
/*--------------------------------*- C++ -*----------------------------------*\
========= |
\\ / F ield | OpenFOAM: The Open Source CFD Toolbox
\\ / O peration | Website: https://openfoam.org
\\ / A nd | Version: 8
\\/ M anipulation |
\*---------------------------------------------------------------------------*/
[b]FoamFile[/b]
{
version 2.0;
format ascii;
class dictionary;
object blockMeshDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
This is a typical header of the file I need to recover, which comes in many different flavours and so the obvious thing to look for is "FoamFile" in bold on line 8. What mostly changes is the version number, so the number of bytes (offset) remains the same. So to avoid having to make as many signatures as there are version numbers, I felt using the offset to be the better way. The source code also sets the maximum length of the signature to 512 (presumably bytes), which means that if you are looking for something after byte 512, there's no other way than using the offset. My question remains, how can I make it work?
You can try that out with a fake file that starts like "123456789Claudio is looking for a definition".
Now we call files with the signature "Claudio" at position 10 "clo"- files
clo 10 "Claudio"
and check out if
clo 0xa "Claudio"
makes a difference. The latter is position 10 in hexadecimal language.
Your example works, because it has only one line without line breaks (except that the offset has to be 9, not 10). However, if I add a second line to the test file, it stops working if it exceeds a length of 73 Bytes… I therefore did some experimenting and observed some strange behaviour when there's a line break in the test file. That's why I suspect my original signatures with offset above don't work, because the test files have all line breaks. I've made a simple signature like this:
and test files with a number of 01234567 sequences. See below the results:
Code: Select all
# Test file 1: This works, fidentify recognizes file as fooOffset1
01234567
01234567
# Test file 2: This works
0123456701234567
01234567
# Test file 3: This works too
01234567…01234567 # 11 sequences
01234567
# Test file 4: This doesn't work, fidentify recognizes test file as txt
01234567…01234567 # 12 sequences
01234567
# Test file 5: It works up to 11 sequences + 0 on the first line
01234567…012345670 # 11 sequences + 0 = 88 + 1 Byte + line break
01234567
# Test file 6: Not working with 88 + 2 Bytes on first line
01234567…0123456701 # 11 sequences + 01 = 88 + 2 Bytes + line break
01234567
# Test file 7: Inserted line break after first sequence, still not working
01234567
01234567…0123456701 # 10 sequences + 01 = 80 + 2 Bytes + line break
01234567
# Test file 8: Remove first line and it works again, although second line has same length as in test file 7
01234567…0123456701 # 10 sequences + 01 = 80 + 2 Bytes + line break
01234567
# Test file 9: Start from Test file 6 again and insert a line break after each sequence, not working
01234567
…
01234567 # 11 sequences up to here
01
01234567
I'm trying to see a pattern here but I'm confused. Why would test file 1 work and 9 not, even though the first line looks exactly the same, including the line break? Also, why does test file 7 work with even 20 sequences on the first line if I remove the offset like this:
I suspect there's a bug here. If I understand correctly, if fidentify finds the signature as specified within the first characters of the file, then the length of the lines or how many lines come after that shouldn't matter. However my Test files show otherwise and I have been able to reproduce the above on two different machines, with same testdisk version and same macOS version.
I could try your suggestion of turning off txt file recognition, however I also have text files to recover which I need. Thus I would rather understand why specifying an offset isn't working properly.