Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Some notes on how to replace a HDD in software raid
Wednesday - Jul 25th 2012 - by - (0 comments)

I'm constantly monitoring the SMART Status of server hard disks and as error rates increase, the chance of a failing disk is imminent. I prefer to replace defect hardware as soon as possible, before it actually fails, if possible. In case of a HDD this is possible.

The following steps explain how to replace a HDD of a software raid unter Linux.

Update February 28th 2013: Added commands for GPT disks.

1. Determine the defect or failing HDD -> in my case I already got that information from my monitoring using SMART data: SDB.
If the disk already completely failed, you can see that also with $(cat /proc/mdstat).

2. Get the current Raid-layout:

# cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[1]
      688009088 blocks [2/2] [UU]

md3 : active raid1 sda5[0] sdb5[1]
      20971392 blocks [2/2] [UU]

md2 : active raid1 sda3[0] sdb3[1]
      20971456 blocks [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
      524224 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      2096064 blocks [2/2] [UU]

unused devices:   

As you can see, disk SDB is still shown as active in all Raid Arrays.

3. (optional in case the failing disk is still working in the software raid)
Set the failing disk (SDB) as "fail" in the software raid:

# mdadm --manage /dev/md1 --fail /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md1
# mdadm --manage /dev/md2 --fail /dev/sdb3
mdadm: set /dev/sdb3 faulty in /dev/md2
# mdadm --manage /dev/md3 --fail /dev/sdb5
mdadm: set /dev/sdb5 faulty in /dev/md3
# mdadm --manage /dev/md4 --fail /dev/sdb6
mdadm: set /dev/sdb6 faulty in /dev/md4

Now the raid status looks like the following (as if SDB failed):

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0] sdb6[2](F)
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0] sdb5[2](F)
      20971392 blocks [2/1] [U_]
                               
md2 : active raid1 sda3[0] sdb3[2](F)
      20971456 blocks [2/1] [U_]
                                 
md1 : active raid1 sda2[0] sdb2[2](F)
      524224 blocks [2/1] [U_]    
                                  
md0 : active raid1 sda1[0] sdb1[2](F)
      2096064 blocks [2/1] [U_]     
                                    
unused devices:   

4. Remove all SDB partitions from each Raid Array:

# mdadm /dev/md0 -r /dev/sdb1           
mdadm: hot removed /dev/sdb1 from /dev/md0
# mdadm /dev/md1 -r /dev/sdb2             
mdadm: hot removed /dev/sdb2 from /dev/md1
# mdadm /dev/md2 -r /dev/sdb3             
mdadm: hot removed /dev/sdb3 from /dev/md2
# mdadm /dev/md3 -r /dev/sdb5             
mdadm: hot removed /dev/sdb5 from /dev/md3
# mdadm /dev/md4 -r /dev/sdb6             
mdadm: hot removed /dev/sdb6 from /dev/md4

Again a verification of the current status of the software Raid - all SDB entries are now removed:

# cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sda6[0]    
      688009088 blocks [2/1] [U_]

md3 : active raid1 sda5[0]      
      20971392 blocks [2/1] [U_]

md2 : active raid1 sda3[0]      
      20971456 blocks [2/1] [U_]
                
md1 : active raid1 sda2[0]      
      524224 blocks [2/1] [U_]  
               
md0 : active raid1 sda1[0]      
      2096064 blocks [2/1] [U_] 

unused devices:   

5. (optional) Check that on the remaining disk a boot loader is installed:

dd if=/dev/sda bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB

6. Shut down server and replace HDD. Then start the server, which should boot from SDA.

7. Copy SDA's partition table to the new SDB HDD (SDA: Good/old, SDB: New empty diks, SDA -> SDB).

For disks with the MBR Master Boot Record:

sfdisk -d /dev/sda | sfdisk /dev/sdb

For disks with the GPT partition table:

sgdisk -R /dev/sdb /dev/sda
sgdisk -G /dev/sdb

8. Insert new SDB to Raid Arrays:

mdadm /dev/md0 -a /dev/sdb1
mdadm /dev/md1 -a /dev/sdb2
mdadm /dev/md2 -a /dev/sdb3
mdadm /dev/md3 -a /dev/sdb5
mdadm /dev/md4 -a /dev/sdb6

9. Check Synchronisation:

cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md4 : active raid1 sdb6[2] sda6[0]
      688009088 blocks [2/1] [U_]
        resync=DELAYED

md3 : active raid1 sdb5[2] sda5[0]
      20971392 blocks [2/1] [U_]
      [>....................]  recovery =  1.2% (271936/20971392) finish=5.0min speed=67984K/sec

md2 : active raid1 sdb3[2] sda3[0]
      20971456 blocks [2/1] [U_]
        resync=DELAYED

md1 : active raid1 sdb2[2] sda2[0]
      524224 blocks [2/1] [U_]
        resync=DELAYED

md0 : active raid1 sdb1[1] sda1[0]
      2096064 blocks [2/2] [UU]

unused devices:

10. Once the synchronisation is finished, don't forget to install the boot loader also on SDB. If SDA fails and you reboot the server, SDB wouldn't have a boot loader and therefore the server wouldn't start up. With Grub V2 it's pretty easy:

# grub-install /dev/sdb
Installation finished. No error reported.

# dd if=/dev/sdb bs=1024 count=1 2>&1 | strings | egrep -i "lilo|grub"
GRUB

 

Add a comment

Show form to leave a comment

Comments (newest first):

No comments yet.

Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

6939 Days
until Death of Computers
Why?