Strange bad hard drive behavior question

How to use TestDisk to recover lost partition
Forum rules
When asking for technical support:
- Search for posts on the same topic before posting a new question.
- Give clear, specific information in the title of your post.
- Include as many details as you can, MOST POSTS WILL GET ONLY ONE OR TWO ANSWERS.
- Post a follow up with a "Thank you" or "This worked!"
- When you learn something, use that knowledge to HELP ANOTHER USER LATER.
Before posting, please read https://www.cgsecurity.org/testdisk.pdf
Locked
Message
Author
davidt
Posts: 2
Joined: 25 Jul 2018, 18:11

Strange bad hard drive behavior question

#1 Post by davidt »

Here is a weird hard drive problem I have, I am not asking for help on recovery...

My 6+ yrs old 3TB external USB Seagate Barracuda ST3000DM001 had gone bad. I used it infrequently and unplugged the power and USB afterwards. The drive was always flaky with the initial connections and had occasional slow file copies but otherwise working. It finally became unrecognizable by the OS after sitting in storage for over a year. It was originally a single 3TB GPT partition formatted as NTFS. After failing it showed up with a wrong drive geometry (a NTFS partition of 375 GB only) and was unmountable/unrecognized in Windows and OS X.

To troubleshoot and save time, I took the drive out of the enclosure and hooked it up directly as SATA. When trying to rewrite the correct drive geometry, Testdisk-7.1-WIP told me that the drive wouldn't let me rewrite the partition ("Partition: Write error"). However testdisk could "see" the data fine after entering the right geometry. The only thing left to do was dumping the drive as a 3TB image to recover however much data possible.

The whole image process took 17 days (~407 hrs)... The destination USB drive was fairly new and in good working condition so no problem there. I ended up with a 3TB image with nearly all the files, as far as I could tell. There were a few hundred input/output and fixup_warn errors in the log, but not sure what they really meant. Only a handful of the files were corrupted or turned-binary in the image when I test-read them.

I then left the drive connected(SATA) in the computer for two months. I got another external USB drive last week and re-tried the image dump so I can unplug and retire the bad drive. To my surprise, not only did the imaging only take about 45 hours, but it also got a few more files than the previous dump!

I did a hash comparison with the files from the first dump, and found that the second time things only improved. The two dumps were about 99.9999% the same, and the difference were the previously missing, corrupted, or turned-binary files, now in perfectly good recognized and readable state.

When I tried a surface scan using Techtool Pro right after the 2nd imaging, the scan only took 26 hours and returned no bad sectors. Two month ago it was reading extremely slowly (bad sectors?) and would probably take weeks to finish. I couldn't wait for it so can't give a time comparison on the scan.

The SMART reading for the drive has tons of reallocated sectors and some pending and uncorrectable counts.

Code: Select all

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   116   064   006    Pre-fail  Always       -       206712440
  3 Spin_Up_Time            0x0003   094   092   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       430
  5 Reallocated_Sector_Ct   0x0033   100   051   036    Pre-fail  Always       -       1144
  7 Seek_Error_Rate         0x000f   070   060   030    Pre-fail  Always       -       11830910
  9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4902
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       307
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       7823
188 Command_Timeout         0x0032   100   099   000    Old_age   Always       -       6 6 9
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   027   045    Old_age   Always   In_the_past 36 (Min/Max 35/41 #2152)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       154
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       2903
194 Temperature_Celsius     0x0022   036   073   000    Old_age   Always       -       36 (0 18 0 0 0)
197 Current_Pending_Sector  0x0012   100   069   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   069   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       873h+54m+29.237s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       29472053552965
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       259894645155740

So what kind of physical drive problem might fit this behavior? I don't understand why it had such difficulty reading the drive two months ago but somehow way smoother now, even though still not trouble-free. I tried creating incomplete images multiple times back then in a span of several days, but the slowness was always consistent (very slow at some %s). Both image dumps and surface scan were done as an internal SATA drive. Why did the two runs differed so much? The computer and the drive weren't running hot, but always-on 24-7. I only reboot the computer maybe twice in between, and the drive was never recognized or mounted, although I am not sure if the spindle was spinning. Other hardware and OS variables were pretty much the same as far as I remember.

Thanks.

User avatar
cgrenier
Site Admin
Posts: 5432
Joined: 18 Feb 2012, 15:08
Location: Le Perreux Sur Marne, France
Contact:

Re: Strange bad hard drive behavior question

#2 Post by cgrenier »

If more sectors have been relocated, the data imaging can be faster but not so much faster. Strange indeed.
Usually you should not use an USB destination if the source is also using USB as they may be sharing the same internal USB hub.

davidt
Posts: 2
Joined: 25 Jul 2018, 18:11

Re: Strange bad hard drive behavior question

#3 Post by davidt »

Yeah, I took the bad drive out of the USB enclosure and did all the tests and dumps as an internal SATA drive. The destination USB 3.0 drives were both new. I have done plenty of backup and src/dst verification on the first USB destination drive since then and it looks to be in perfect working condition, so the slowness probably was not on the destination side.

For pure sick fun, here is the graph of the image read % vs time of the first (17-day) dump...
Image

I am going to try wiping the bad 3TB drive if it allows, and use it as a buffer drive or can-die-anytime backup... squeeze the last drop of life out of it. Thanks.

Locked