RAID5 data recovery case in DELL server


In RAID5, one hard disk started to go offline, and a new hard disk was replaced. During the process of data synchronization, the indicator light of the other disk gave an alarm, and the data synchronization failed, and the array could not work normally.

Server failure detection:
The engineer tested the hard disk of the faulty server and found that the first offline hard disk had extremely slow access speed, the second offline disk had a small number of bad sectors, and no obvious physical failure was found on other disks. The entire array contains only one volume group, which occupies the entire space of the array, and this volume group has only one XFS raw partition whose starting position is sector 0.
RAID5 only supports the error redundancy function of one disk, and the array cannot work normally when the second disk is offline. In this case, the array crash was mainly caused by the second disk being offline, so the processing of the second disk is the key to data recovery of the server.

Server data recovery process:
1. Make a read-only mirror backup of all hard disks, back up the second offline hard disk separately, and skip bad sectors during the backup process.
2. Carried out the XOR test on the image files of 15 hard disks, all of them passed without obvious errors.
3. The engineer calculates the data of the damaged sector of the second hard disk and writes it into the image file.
4. Analyze the original RAID composition structure at the same time during the backup process, build a virtual RAID environment, and verify whether the RAID structure is correct.
5. Back up the image of the second disk to the new hard disk, force it to go online, replace the first disk, and synchronize it. (Note: All hard disks should be backed up before operation)
6. Export data.

Server Data Recovery Conclusion:
Because the XOR test is completely passed, no new data should be written or the structure should be changed after the failure occurs. The data corresponding to the location of the bad hard disk can be calculated based on several other good hard disks.
After the recovery is complete, the directory structure is complete and all important documents are intact. FSCK has no error message, the administrator approves the recovered data, and the data recovery is successful this time.

