Friday, November 30, 2007

Replacing Failed Disk in Linux Software Raid


When I worked at the University of Minnesota we used IBM x86 servers and since well it was the University we didn't buy a lot of systems with hardware raid.   This meant that software raid in Linux was the rage.   Unfortunately the team I worked with never really documented the process very well and so the following write up takes a practical example and illustrates the steps.

Overview:

 1) Determine which disk has failed.
 2) Remove failed disk from metadevice using mdadm. 
 3) Physically replace disk. 
 4) Partition new disk using sfdisk. 
 5) Add new disk back into raid metadevice using mdadm. 
 6) Confirm array is rebuilding by "cat /proc/mdstat". 

 Example: 

 1) Below, md3 and md4 have a failed device sdc, which contains slices sdc1 and sdc2.
#cat /proc/mdstat Personalities : [raid1] [raid5] read_ahead 1024 sectors Event: 5 md2 : active raid1 sdb1[1] sda1[0] 80192 blocks [2/2] [UU] resync=DELAYED md0 : active raid1 sdb2[1] sda2[0] 2096384 blocks [2/2] [UU] resync=DELAYED md1 : active raid1 sdb3[1] sda3[0] 33366912 blocks [2/2] [UU] [===>.................] resync = 16.5% (5511296/33366912) finish=46.2min speed=10027K/sec md3 : active raid5 sde1[2] sdd1[1] 35342720 blocks level 5, 64k chunk, algorithm 0 [3/2] [_UU] md4 : active raid5 sde2[2] sdd2[1] 35744384 blocks level 5, 64k chunk, algorithm 0 [3/2] [_UU]
2) We need to use mdadm to remove failed slices from array. Note this is not always needed, but we will show example for practicle purposes.
#mdadm -r /dev/md3 /dev/sdc1 #mdadm -r /dev/md4 /dev/sdc2
3) Physically remove the disk(s) from the system. 4) Partition the new disk, using a partition table from another member(device) of the array. This example will use /dev/sde and we will dump partition table out to a file and then read it into the new device using sfdisk.
#sfdisk -d /dev/sde>/tmp/partition.out #sfdisk /dev/sdc
5) Add the device slices back into corresponding raid metadevice using mdadm.
#mdadm -a /dev/md3 /dev/sdc1 #mdadm -a /dev/md4 /dev/sdc2
6) Cat /proc/mdstat for results.
#cat /proc/mdstat Personalities : [raid1] [raid5] read_ahead 1024 sectors Event: 8 md2 : active raid1 sdb1[1] sda1[0] 80192 blocks [2/2] [UU] resync=DELAYED md0 : active raid1 sdb2[1] sda2[0] 2096384 blocks [2/2] [UU] resync=DELAYED md1 : active raid1 sdb3[1] sda3[0] 33366912 blocks [2/2] [UU] [==========>..........] resync = 52.8% (17633280/33366912) finish=25.6min speed=10228K/sec md3 : active raid5 sdc1[3] sde1[2] sdd1[1] 35342720 blocks level 5, 64k chunk, algorithm 0 [3/2] [_UU] [============>........] recovery = 60.9% (10776268/17671360) finish=11.2min speed=10233K/sec md4 : active raid5 sdc2[3] sde2[2] sdd2[1] 35744384 blocks level 5, 64k chunk, algorithm 0 [3/2] [_UU]
Note: md4 will not start rebuilding until md3 is complete, since both raids contain slices from the same physical disk.