My colleague used photorec to recover several thousands of administrative files, some of them as old as 1990.
Of course, almost all had filenames like f26490920.pdf, their creation dates were lost, and it was very difficult to use them. So I wrote a small python program to try and find the files dates, and guess some intelligible name for all of them. If you want to try, I'd be happy if it could help someone else:
https://github.com/sanette/rename_by_content
It works by detecting the file type and metadata, and finally extracting text content (my colleague had many scanned documents, for them OCR is performed via tesseract). All files are actually copied in a new folder with year/month directories, so there is no danger to try: it does not modify the original files.
Supported file formats are: pdf, ai, doc, tar, zip, txt, mbox, ods, xls, xlsx, docx, docm, html, rtf, odt, png, jpg, gif, bmp, tif, ppt, pptx ,odg
Of course, feedback is welcome.
ps: date recognition is taylored for French format, but if anyone is interested in other languages, it should be straightforward to adapt. Just tell me.
automatically rename files after photorec
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Re: automatically rename files after photorec
Hello,
I am testing your script in Ubuntu 18.04.1 home made Bento Openbox version. I failed at the first attempt, while having the files in an external hard drive formatted to ext4. The error messages were mentioning something about permissions, so I have formatted a hard drive as Ntfs, copied all the files there and I am about to restart. The hard drive is 1.8 Tb, the data take 258 Gb, so that should do. The hard drive is plugged on a dock.
I am testing your script in Ubuntu 18.04.1 home made Bento Openbox version. I failed at the first attempt, while having the files in an external hard drive formatted to ext4. The error messages were mentioning something about permissions, so I have formatted a hard drive as Ntfs, copied all the files there and I am about to restart. The hard drive is 1.8 Tb, the data take 258 Gb, so that should do. The hard drive is plugged on a dock.
Re: automatically rename files after photorec
Hello,
I met with an error as a unicode decoder was not found, but I found on the internet that I could install it using pip.
https://github.com/reallistic/BitcasaFi ... /issues/29
your script does not seem to come with a recursive option? First time it was not able to go in the inside directories created by photorec
(recup_dir.1 recup_dir.20 recup_dir.31 recup_dir.42 recup_dir.53
recup_dir.10 recup_dir.21 recup_dir.32 recup_dir.43 recup_dir.54
recup_dir.11 recup_dir.22 recup_dir.33 recup_dir.44 recup_dir.55
recup_dir.12 recup_dir.23 recup_dir.34 recup_dir.45 recup_dir.56
recup_dir.13 recup_dir.24 recup_dir.35 recup_dir.46 recup_dir.57
recup_dir.14 recup_dir.25 recup_dir.36 recup_dir.47 recup_dir.58
recup_dir.15 recup_dir.26 recup_dir.37 recup_dir.48 recup_dir.6
recup_dir.16 recup_dir.27 recup_dir.38 recup_dir.49 recup_dir.7
recup_dir.17 recup_dir.28 recup_dir.39 recup_dir.5 recup_dir.8
recup_dir.18 recup_dir.29 recup_dir.4 recup_dir.50 recup_dir.9
recup_dir.19 recup_dir.3 recup_dir.40 recup_dir.51
recup_dir.2 recup_dir.30 recup_dir.41 recup_dir.52)
that makes 258 Gb.
Then I retried to test on just one directory, copying it in a new test directory.
Here is the content of that new test directory:
the log file created contains now this:
Just please tell me, did I do something wrong?
Thanks.
I met with an error as a unicode decoder was not found, but I found on the internet that I could install it using pip.
https://github.com/reallistic/BitcasaFi ... /issues/29
your script does not seem to come with a recursive option? First time it was not able to go in the inside directories created by photorec
(recup_dir.1 recup_dir.20 recup_dir.31 recup_dir.42 recup_dir.53
recup_dir.10 recup_dir.21 recup_dir.32 recup_dir.43 recup_dir.54
recup_dir.11 recup_dir.22 recup_dir.33 recup_dir.44 recup_dir.55
recup_dir.12 recup_dir.23 recup_dir.34 recup_dir.45 recup_dir.56
recup_dir.13 recup_dir.24 recup_dir.35 recup_dir.46 recup_dir.57
recup_dir.14 recup_dir.25 recup_dir.36 recup_dir.47 recup_dir.58
recup_dir.15 recup_dir.26 recup_dir.37 recup_dir.48 recup_dir.6
recup_dir.16 recup_dir.27 recup_dir.38 recup_dir.49 recup_dir.7
recup_dir.17 recup_dir.28 recup_dir.39 recup_dir.5 recup_dir.8
recup_dir.18 recup_dir.29 recup_dir.4 recup_dir.50 recup_dir.9
recup_dir.19 recup_dir.3 recup_dir.40 recup_dir.51
recup_dir.2 recup_dir.30 recup_dir.41 recup_dir.52)
that makes 258 Gb.
Then I retried to test on just one directory, copying it in a new test directory.
Here is the content of that new test directory:
I had copied your script there too, to simplify the command line, then I invoked:$ ls -l
total 85
-rwxrwxrwx 1 fluffy1 fluffy1 12066 déc. 28 16:01 exiftool.py
-rwxrwxrwx 1 fluffy1 fluffy1 12410 janv. 2 11:54 exiftool.pyc
-rwxrwxrwx 1 fluffy1 fluffy1 534 janv. 2 12:08 log-renamebycontent.txt
-rwxrwxrwx 1 fluffy1 fluffy1 12506 déc. 28 16:06 os.path
drwxrwxrwx 1 fluffy1 fluffy1 0 janv. 2 12:03 recup_dir.1
drwxrwxrwx 1 fluffy1 fluffy1 0 janv. 2 12:06 recup_dir.1-2
-rwxrwxrwx 1 fluffy1 fluffy1 38181 déc. 28 16:01 rename_by_content.py
$
Code: Select all
python ./rename_by_content.py --log log-renamebycontent.txt --output recup_dir.1-2/ recup_dir.1/*
the recup_dir.1-2 directory contains this:-------------------------------- Summary of renamed files: --------------------------------
[recup_dir.1/f0018360_pid_0.m2ts] was copied to [recup_dir.1-2/Unknown_year/f0018360_pid_0.mts] ()
[recup_dir.1/f0507438.m2ts] was copied to [recup_dir.1-2/Unknown_year/f0507438.mts] ()
[recup_dir.1/f4521516.m2ts] was copied to [recup_dir.1-2/Unknown_year/f4521516.mts] ()
[recup_dir.1/report.xml] was copied to [recup_dir.1-2/Unknown_year/report.xml] ()
------------------- Done. Copied 4 of 4 files to recup_dir.1-2/ ---------------------
the source directory contains the same files.$ ls -lR
.:
total 0
drwxrwxrwx 1 fluffy1 fluffy1 0 janv. 2 12:08 Unknown_year
./Unknown_year:
total 2263648
-rwxrwxrwx 1 fluffy1 fluffy1 249468928 janv. 1 19:15 f0018360_pid_0.mts
-rwxrwxrwx 1 fluffy1 fluffy1 2050582528 janv. 1 19:17 f0507438.mts
-rwxrwxrwx 1 fluffy1 fluffy1 17903616 janv. 1 19:17 f4521516.mts
-rwxrwxrwx 1 fluffy1 fluffy1 14389 janv. 1 19:17 report.xml
I might want to try with another directory, containing other types of files, I guess.$ ls -l
total 2263648
-rwxrwxrwx 1 fluffy1 fluffy1 249468928 janv. 1 19:15 f0018360_pid_0.m2ts
-rwxrwxrwx 1 fluffy1 fluffy1 2050582528 janv. 1 19:17 f0507438.m2ts
-rwxrwxrwx 1 fluffy1 fluffy1 17903616 janv. 1 19:17 f4521516.m2ts
-rwxrwxrwx 1 fluffy1 fluffy1 14389 janv. 1 19:17 report.xml
Just please tell me, did I do something wrong?
Thanks.
Re: automatically rename files after photorec
Hello,
this time, the python script has created yearly new subdirectories. Here is the log for this first (half-)successfull iteration.
http://pastebin.fr/55468
Only a few mp3 files have got human readable names, afaik the other files/extensions have only been triaged by year.
Are any more improvements possible?
this time, the python script has created yearly new subdirectories. Here is the log for this first (half-)successfull iteration.
http://pastebin.fr/55468
Only a few mp3 files have got human readable names, afaik the other files/extensions have only been triaged by year.
Are any more improvements possible?
Re: automatically rename files after photorec
Hello,
as no answer were coming on my last 3 threads, I continued as I thought most fit. Now I have a few questions, related to this python script which brings an improvement, even if yet far from perfectly performing from gibberish named files to understandable named files.
The program can't work on all file formats : could that be improved?
The program overwrites the log, so I would like to ask if there could be some kind of append option added so that the log file would be growing, or automatically incremented to create a new one after each time the command line was called again?
If you would want to continue the discussion in French, it's also possible.
Thank you for your share!
as no answer were coming on my last 3 threads, I continued as I thought most fit. Now I have a few questions, related to this python script which brings an improvement, even if yet far from perfectly performing from gibberish named files to understandable named files.
The program can't work on all file formats : could that be improved?
The program overwrites the log, so I would like to ask if there could be some kind of append option added so that the log file would be growing, or automatically incremented to create a new one after each time the command line was called again?
If you would want to continue the discussion in French, it's also possible.
Thank you for your share!
Re: automatically rename files after photorec
Hello again,
just so you know, the source directory contains more MB than the destination directory once the work is finished.
Mélodie
just so you know, the source directory contains more MB than the destination directory once the work is finished.
Best regards,[fluffy1@shebang:/media/fluffy1/0BEB84160FCC8CA3]
$ du -csm DATA-Photorec
263383 DATA-Photorec
263383 total
$ du -csm _RECUP_DIR.2/
259478 _RECUP_DIR.2/
259478 total
$ bc
bc 1.07.1
Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'.
263383 - 259478
3905
Mélodie