Cyrillic filenames displaying strangely after copying
Posted: 25 Nov 2015, 14:28
Hello,
I'm helping a colleague who works in a Russian archive to recover files from a faulty external hard drive (a Freecom Toughdrive).
I've been able to use TestDisk to both view the disk and copy stored files from it (around 150GB in total). Many of the files being recovered have Cyrillic filenames. In the copying process, TestDisk seems to have changed these into 'weird' characters, e.g:
Копия Свадьба
becomes
ÐšÐ¾Ð¿Ð¸Ñ Ð¡Ð²Ð°Ð´ÑŒÐ±Ð°
It's true across every instance of a Cyrillic filename. I've also had similar results with EaseUS, which displays filenames correctly in the Partition Recovery browser, but in the Data Recovery Wizard it shows them simply as strings of '___'.
I'm no expert at all but I believe this is something to do with the Cyrillic UTF-8 characters being displayed in Windows-1252 encoding. Feeding various altered filenames through an online converter to get the correct Cyrillic translation (using this: http://2cyr.com/decode/?lang=en) seems to establish that.
Is there a way to preserve the Cyrillic filenames when extracting them with TestDisk? Or is it that TestDisk can't translate them? If so, are there any other options? Might it be an issue with not having the correct Windows language pack on the PC I'm copying files to?
It's a problem because there are literally thousands of files dating back many years that have been affected, so manually going through and renaming them using the online converter will be a massive headache. Another colleague has offered to write a script that would batch convert them, but I'm hoping there's another potentially more straightforward solution.
Thanks in advance for any ideas/help you could give.
I'm helping a colleague who works in a Russian archive to recover files from a faulty external hard drive (a Freecom Toughdrive).
I've been able to use TestDisk to both view the disk and copy stored files from it (around 150GB in total). Many of the files being recovered have Cyrillic filenames. In the copying process, TestDisk seems to have changed these into 'weird' characters, e.g:
Копия Свадьба
becomes
ÐšÐ¾Ð¿Ð¸Ñ Ð¡Ð²Ð°Ð´ÑŒÐ±Ð°
It's true across every instance of a Cyrillic filename. I've also had similar results with EaseUS, which displays filenames correctly in the Partition Recovery browser, but in the Data Recovery Wizard it shows them simply as strings of '___'.
I'm no expert at all but I believe this is something to do with the Cyrillic UTF-8 characters being displayed in Windows-1252 encoding. Feeding various altered filenames through an online converter to get the correct Cyrillic translation (using this: http://2cyr.com/decode/?lang=en) seems to establish that.
Is there a way to preserve the Cyrillic filenames when extracting them with TestDisk? Or is it that TestDisk can't translate them? If so, are there any other options? Might it be an issue with not having the correct Windows language pack on the PC I'm copying files to?
It's a problem because there are literally thousands of files dating back many years that have been affected, so manually going through and renaming them using the online converter will be a massive headache. Another colleague has offered to write a script that would batch convert them, but I'm hoping there's another potentially more straightforward solution.
Thanks in advance for any ideas/help you could give.