How to replace a hard or solid state drive in Linux software raid with mdadm

Written by - 0 comments

Published on - last updated on September 21st 2023 - Listed in Linux Hardware


I'm constantly monitoring the SMART Status of server hard disks and as error rates increase, the chance of a failing disk is imminent. I prefer to replace defect hardware as soon as possible, before it actually fails, if possible. In case of a HDD this is possible.

The following steps explain how to replace a HDD of a software raid unter Linux. These steps also apply to solid state drives (SSD) of course.

Update February 28th 2013: Added commands for GPT disks.

1. Determine the defect or failing HDD -> in my case I already got that information from my monitoring using SMART data: SDB.  If the disk already completely failed, you can see that also with cat /proc/mdstat.

2. Get the current Raid-layout:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[1]
      688009088 blocks [2/2] [UU]

md3 : active raid1 sda5[0] sdb5[1]
      20971392 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      20971456 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      524224 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      2096064 blocks [2/2] [UU]

unused devices:

As you can see, disk SDB is still shown as active in all Raid Arrays.

3. (optional in case the failing disk is still working in the software raid)
Set the failing disk (SDB) as "fail" in the software raid:

# mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1
# mdadm --manage /dev/md2 --fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md2
# mdadm --manage /dev/md3 --fail /dev/sdb5
mdadm: set /dev/sdb5 faulty in /dev/md3
# mdadm --manage /dev/md4 --fail /dev/sdb6
mdadm: set /dev/sdb6 faulty in /dev/md4

Now the raid status looks like the following (as if SDB failed):

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[2](F)
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0] sdb5[2](F)
      20971392 blocks [2/1] [U_]

md2 : active raid1 sda3[0] sdb3[2](F)
      20971456 blocks [2/1] [U_]

md1 : active raid1 sda2[0] sdb2[2](F)
      524224 blocks [2/1] [U_]    

md0 : active raid1 sda1[0] sdb1[2](F)
      2096064 blocks [2/1] [U_]     

unused devices:

4. Remove all SDB partitions from each Raid Array:

# mdadm /dev/md0 -r /dev/sdb1
mdadm: hot removed /dev/sdb1 from /dev/md0
# mdadm /dev/md1 -r /dev/sdb2
mdadm: hot removed /dev/sdb2 from /dev/md1
# mdadm /dev/md2 -r /dev/sdb3
mdadm: hot removed /dev/sdb3 from /dev/md2
# mdadm /dev/md3 -r /dev/sdb5
mdadm: hot removed /dev/sdb5 from /dev/md3
# mdadm /dev/md4 -r /dev/sdb6
mdadm: hot removed /dev/sdb6 from /dev/md4

Again a verification of the current status of the software Raid - all SDB entries are now removed:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0]    
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0]      
      20971392 blocks [2/1] [U_]

md2 : active raid1 sda3[0]      
      20971456 blocks [2/1] [U_]

md1 : active raid1 sda2[0]      
      524224 blocks [2/1] [U_]  

md0 : active raid1 sda1[0]      
      2096064 blocks [2/1] [U_] 

unused devices:

5. (optional) Check that on the remaining disk a boot loader is installed:

# dd if=/dev/sda bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB

6. Shut down server (if necessary) and replace the drive. Then start the server, which should boot from SDA.

7. Copy SDA's partition table to the new SDB HDD (SDA: Good/old, SDB: New empty diks, SDA -> SDB).

Note: If you are going to replace the drive with a larger drive and your goal is to extend the size of the raid array, do not copy the partition table. Instead check out this article: Replace hard or solid state drive with a bigger one and grow software (mdadm) raid.

For disks with the MBR Master Boot Record:

# sfdisk -d /dev/sda | sfdisk /dev/sdb

For drives with the GPT partition table (all drives larger than 2TB):

# sgdisk -R /dev/sdb /dev/sda
# sgdisk -G /dev/sdb

8. Insert new SDB to Raid Arrays:

# mdadm /dev/md0 -a /dev/sdb1
# mdadm /dev/md1 -a /dev/sdb2
# mdadm /dev/md2 -a /dev/sdb3
# mdadm /dev/md3 -a /dev/sdb5
# mdadm /dev/md4 -a /dev/sdb6

9. Check Synchronisation:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sdb6[2] sda6[0]
      688009088 blocks [2/1] [U_]
        resync=DELAYED

md3 : active raid1 sdb5[2] sda5[0]
      20971392 blocks [2/1] [U_]
      [>....................]  recovery =  1.2% (271936/20971392) finish=5.0min speed=67984K/sec

md2 : active raid1 sdb3[2] sda3[0]
      20971456 blocks [2/1] [U_]
        resync=DELAYED

md1 : active raid1 sdb2[2] sda2[0]
      524224 blocks [2/1] [U_]
        resync=DELAYED

md0 : active raid1 sdb1[1] sda1[0]
      2096064 blocks [2/2] [UU]

unused devices:

10. Once the synchronisation is finished, don't forget to install the boot loader also on SDB. If SDA fails and you reboot the server, SDB wouldn't have a boot loader and therefore the server wouldn't start up. With Grub V2 it's pretty easy:

# grub-install /dev/sdb
Installation finished. No error reported.

# dd if=/dev/sdb bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.