| Removing Failed RAID Devices |
|
|
|
| Written by Tom Hirt | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Wednesday, 17 June 2009 08:34 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
How-to Remove Failed RAID DevicesIn this KB, we will discuss how to recover from the loss of a device in a Linux software RAID array. We will demonstrate how to manually fail a disk, remove it and then re-add and rebuild the array. It's inevitable that a device in your RAID array will eventually fail. Replacing the failed device should be done as soon as possible, as different levels of RAID have varying abilities to sustain device loss (see our Linux RAID How-to for a description of the different RAID levels and their sustainability with failed devices.) Let's begin by inspecting our array:
The UUU indicates that all three devices in the array are up and online. We are now going to simulate a disk failure by manually failing the second device (sdc1) in the array (/dev/md0). Under most circumstances, should a device fail, the Linux RAID subsystem should detect the failure and automatically mark the disk failed. You should not have to manually set the device as failed unless you already suspect issues with the disk.
Note: In order to remove a device from the array, it must be marked as faulty. We can now verify the disk has been marked faulty by inspecting /proc/mdstat
Note: You will notice the (F) next to the failed device with only 3/2 disks listed as "U" or up. We can further inspect the array using the --detail command line switch with the mdadm command
We can now remove the failed device (/dev/sdc1) from the array
Once the device has been removed, you can replace the faulty disk. I'll caution you that if you plan to hot swap the disk, you could fry your hardware and even worse, loose the entire array. If at all possible, shutdown the array and replace the failed disk with the server powered off. If you must perform a hot swap, most SCSI controllers should support hot swapping (use with caution) however SATA support for host swapping is still only limited to a handful of device drivers (see http://linux.yyz.us/sata/sata-status.html for a full list of drivers that support NCQ.) That said, SATA hot swapping is strongly discouraged so proceed with extreme caution. Adding Device to a RAID arrayAfter you have replaced the failed device, you can re-add the new device back into the array which will automatically initiate a rebuild. Begin by creating a primary partition of type fb on the new device. Note: the Linux RAID subsystem only supports partitions of type fb
Once the device has been partitioned, you can re-add the device to the array
Monitor the rebuild of the array watching /proc/mdstat
Once the rebuild has completed, you should once again be fault protected. Good luck!
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Last Updated on Thursday, 18 June 2009 15:35 | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||