Feature request: limit file size or skip file Topic is solved

Using PhotoRec to recover lost data
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Message
Author
FransM
Posts: 12
Joined: 19 Dec 2023, 14:24

Re: Feature request: limit file size or skip file

#11 Post by FransM »

Hmm, I spotted a possible issue while reading the code.

In file_riff.c it reads
165 file_size = next_fs;
166 /* align to word boundary */
167 file_size = (file_size&1);

Is this last line correct? I see a bitwise and with 1 here so after this files_size will be 0 or 1
Shouldn't this line read:
file_size = (file_size&~1);
if rounding down is desired or
file_size = ((file_size+1)&~1);
if we want to round up.

(so add 1 if rounding up is desired and instead of anding with 1 and with not 1)

As testing takes 10.5 hr I'd rather get some feedback before actually attempting to make this change.

@cgrenier what is your opinion?

EDIT: after this I figured that I of course set a breakpoint at this line, examine file_size; skip the line, then set file_size to the value obtained just before.
The breakpoint was hit a few times and afterwards it continued and I got an avi file recovered.
Triggered by this I decided to make the change in this code and - lo and behold - recovering continues.

I did notice a small UI issue after recovering though.
The estimated time to completion is very short; see attachment.
I suspect this is due to taking the time from the very first sector to the sector we are and then estimate based on the remaining number of sectors.
However probably we need to take the number of sectors that were read since recovery and calculate based upon that.
Attachments
Screenshot from 2023-12-21 22-51-55.png
Screenshot from 2023-12-21 22-51-55.png (70.47 KiB) Viewed 82022 times

User avatar
cgrenier
Site Admin
Posts: 5432
Joined: 18 Feb 2012, 15:08
Location: Le Perreux Sur Marne, France
Contact:

Re: Feature request: limit file size or skip file

#12 Post by cgrenier »

Thanks for spotting the problem. Can you try the following fix ?

Code: Select all

diff --git a/home/kmaster/src/git/testdisk/src/file_riff.c b/./file_riff.c
index 4bd69e0..3b68d6e 100644
--- a/home/kmaster/src/git/testdisk/src/file_riff.c
+++ b/./file_riff.c
@@ -153,7 +153,7 @@ static void check_riff_list(file_recovery_t *fr, const unsigned int depth, const
 #ifdef DEBUG_RIFF
       log_riff_list(file_size, depth, list_header);
 #endif
-      check_riff_list(fr, depth+1, file_size + sizeof(riff_list_header_t), next_fs-1);
+      check_riff_list(fr, depth+1, file_size + sizeof(riff_list_header_t), next_fs);
     }
     else
     {
@@ -164,7 +164,7 @@ static void check_riff_list(file_recovery_t *fr, const unsigned int depth, const
     }
     file_size = next_fs;
     /* align to word boundary */
-    file_size = (file_size&1);
+     file_size = (file_size+1) & ~(uint64_t)1;
   }
 }
 
@@ -220,7 +220,7 @@ static void file_check_avi(file_recovery_t *fr)
       return;
     }
     /*@ assert calculated_file_size <= PHOTOREC_MAX_FILE_SIZE; */
-    check_riff_list(fr, 1, file_size + sizeof(riff_list_header_t), calculated_file_size - 1);
+    check_riff_list(fr, 1, file_size + sizeof(riff_list_header_t), calculated_file_size);
     if(fr->offset_error > 0)
     {
       fr->file_size=0;

FransM
Posts: 12
Joined: 19 Dec 2023, 14:24

Re: Feature request: limit file size or skip file

#13 Post by FransM »

I already made the middle change yesterday evening.
That one seemed to fix the issue. That is, photorec continued and recovered more files.
The recovery of the disk is still in progress and will probably complete tomorrow early morning.
Once the recovery of this disk is completed I'll try your patch in combination with the saved .ses file from the hanging situation.

BTW did you see my report that the estimated time is not calculated properly.

FransM
Posts: 12
Joined: 19 Dec 2023, 14:24

Re: Feature request: limit file size or skip file

#14 Post by FransM »

cgrenier wrote: 22 Dec 2023, 10:38 Thanks for spotting the problem. Can you try the following fix ?

Code: Select all

diff --git a/home/kmaster/src/git/testdisk/src/file_riff.c b/./file_riff.c
index 4bd69e0..3b68d6e 100644
--- a/home/kmaster/src/git/testdisk/src/file_riff.c
+++ b/./file_riff.c
@@ -153,7 +153,7 @@ static void check_riff_list(file_recovery_t *fr, const unsigned int depth, const
 #ifdef DEBUG_RIFF
       log_riff_list(file_size, depth, list_header);
 #endif
-      check_riff_list(fr, depth+1, file_size + sizeof(riff_list_header_t), next_fs-1);
+      check_riff_list(fr, depth+1, file_size + sizeof(riff_list_header_t), next_fs);
     }
     else
     {
@@ -164,7 +164,7 @@ static void check_riff_list(file_recovery_t *fr, const unsigned int depth, const
     }
     file_size = next_fs;
     /* align to word boundary */
-    file_size = (file_size&1);
+     file_size = (file_size+1) & ~(uint64_t)1;
   }
 }
 
@@ -220,7 +220,7 @@ static void file_check_avi(file_recovery_t *fr)
       return;
     }
     /*@ assert calculated_file_size <= PHOTOREC_MAX_FILE_SIZE; */
-    check_riff_list(fr, 1, file_size + sizeof(riff_list_header_t), calculated_file_size - 1);
+    check_riff_list(fr, 1, file_size + sizeof(riff_list_header_t), calculated_file_size);
     if(fr->offset_error > 0)
     {
       fr->file_size=0;
I've applied this patch and restarted with the .ses file from the time it failed, and let it recover for a while and it worked.
With my own version that had only the 2nd hunk I managed to recover the whole disk. but I did not have the time or disk space to do another full recovery from the 8TB disk.
So in my opinion this patch is ok.

A few other remarks.

I already reported that when restarting with a .ses file that the estimate is wrong.
I found where this happens, it is in phnc.c where it says:

120 const time_t eta=(partition->part_offset+partition->part_size-1-offset)*elapsed_time/(offset-partition->part_offset);
121 wprintw(window," - Estimated time to completion %uh%02um%02u\n",
122 (unsigned)(eta/3600),
123 (unsigned)((eta/60)%60),
124 (unsigned)(eta%60));

I feel in line 120 the part
(offset-partition->part_offset)
should be replaced with something like
(offset -offset-where recovery-started)
However I did not find a variable that gives the offset where the recovery started.

And about the usage:
When photorec completed it went back to the partition selection screen. It would be nice if the progress screen still would be shown.
That way one can still e.g. make a screenshot about the # of files restored.
Alternately this info could be written to a photorec.summary file or so.

And this info in the log
8471497056 sectors contain unknown data, 22841 invalid files found and rejected.
would also be nice to have even if logging is not enabled, Maybe it can be added to the status screen.

Anyway, my friend whose disk this was was quite happy with the recovered data (and he now learned why backup is important)

FransM
Posts: 12
Joined: 19 Dec 2023, 14:24

Re: Feature request: limit file size or skip file

#15 Post by FransM »

One (probably) final post.
The disk was ransomed, all files were put in a .7z file.
However 7z l -slt retrieves some info from the zip file including the file name and the crc for the file.
By calculating the crc of the recovered files and comparing with the info in the 7z listing it was possible to restore the original name.

While performing this recovery I observed that it would be convenient if instead of directly restoring the file to disk, the file would be passed on to an external command (e.g. passing the file data through a pipe)
That way the user can in that program/script decide on whether it wants to restore the file and where.
E.g. in the above case I could have done the calculation of the crc in such a script and store that somewhere.
Or that a file already was recovered (because multiple copies of the same file were present), so it does not need to be stored.
Basically all those things can be done in post-recovery steps, but in case of multiple files, doing things on the fly would be a great time saver.

(and a bit off-topic but a crc is not enough to decide on duplication. On my 2.1 million files recovered I had about 300 crc's that were calculated form different files (files that had a different sha1 hash, and different content)

Post Reply