Replace hard or solid state drive with a bigger one and grow software (mdadm) raid

Written by - 6 comments

Published on - last updated on April 25th 2022 - Listed in Linux Hardware


In my last post I announced a new release of the check_smart monitoring plugin, that it would now check additional SMART attributes (not just Current_Pending_Sector). And as soon as I rolled out the new version on to the servers, I was immediately alarmed about a failing SSD:

=== START OF INFORMATION SECTION ===
Model Family:     Samsung based SSDs
Device Model:     SAMSUNG SSD PM810 2.5" 7mm 128GB
Serial Number:    XXXXXXXXXXXXXX
LU WWN Device Id: 5 0000f0 000000000
Firmware Version: AXM08D1Q
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS, ATA/ATAPI-7 T13/1532D revision 1
SATA Version is:  SATA 2.6, 3.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Jun  6 20:32:23 2019 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

[...]

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  5 Reallocated_Sector_Ct   0x0033   099   099   ---    Pre-fail  Always       -       16
  9 Power_On_Hours          0x0032   095   095   ---    Old_age   Always       -       25108
 12 Power_Cycle_Count       0x0032   099   099   ---    Old_age   Always       -       890
175 Program_Fail_Count_Chip 0x0032   099   099   ---    Old_age   Always       -       11
176 Erase_Fail_Count_Chip   0x0032   100   100   ---    Old_age   Always       -       0
177 Wear_Leveling_Count     0x0013   075   075   ---    Pre-fail  Always       -       877
178 Used_Rsvd_Blk_Cnt_Chip  0x0013   080   080   ---    Pre-fail  Always       -       396
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   082   082   ---    Pre-fail  Always       -       722
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   082   082   ---    Pre-fail  Always       -       3310
181 Program_Fail_Cnt_Total  0x0032   099   099   ---    Old_age   Always       -       16
182 Erase_Fail_Count_Total  0x0032   100   100   ---    Old_age   Always       -       0
183 Runtime_Bad_Block       0x0013   099   099   ---    Pre-fail  Always       -       16
187 Uncorrectable_Error_Cnt 0x0032   067   067   ---    Old_age   Always       -       33281
195 ECC_Error_Rate          0x001a   001   001   ---    Old_age   Always       -       33281

198 Offline_Uncorrectable   0x0030   100   100   ---    Old_age   Offline      -       0
199 CRC_Error_Count         0x003e   253   253   ---    Old_age   Always       -       0
232 Available_Reservd_Space 0x0013   080   080   ---    Pre-fail  Always       -       1620
241 Total_LBAs_Written      0x0032   037   037   ---    Old_age   Always       -       2708399316
242 Total_LBAs_Read         0x0032   035   035   ---    Old_age   Always       -       2781759092

16 already reallocated sectors (which on this drive were also counted as Program_Fail_Cnt_Total and Runtime_Bad_Block) and more than 33'000 non-correctable errors! The SSD however is quite "old", given the drive was running for more than 25'000 hours and it's also a drive of an older solid state generation.

That drive is part of a software RAID-1, managed by mdadm, which is shown as physical volume (PV) to the Logical Volume Manager (LVM):

# cat /proc/mdstat
[...]

md3 : active raid1 sdc1[1] sdb1[0]
      124968256 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices:

# pvs | grep md3
  /dev/md3   vgssd    lvm2 a--  119.18g      0

I thought that's a great moment to increase that raid by replacing both 128GB drives with two newer 224GB drives.

Replacing the drives

To keep the data, I first removed the drive /dev/sdc (with the huge amount of errors in the SMART table above) following a previous step by step guide I once wrote (Some notes on how to replace a HDD in software raid).

After I physically replaced the drive, I did one step differently than in the mentioned guide: Instead of copying the partition table from the still remaining drive (/dev/sdb) I manually created a new partition, filling up the whole drive:

# fdisk /dev/sdc

Welcome to fdisk (util-linux 2.29.2).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): n
Partition type
   p   primary (0 primary, 0 extended, 4 free)
   e   extended (container for logical partitions)
Select (default p): p
Partition number (1-4, default 1):
First sector (2048-468877311, default 2048):
Last sector, +sectors or +size{K,M,G,T,P} (2048-468877311, default 468877311):

Created a new partition 1 of type 'Linux' and of size 223.6 GiB.

Command (m for help): t
Selected partition 1
Partition type (type L to list all types): da
Changed type of partition 'Linux' to 'Non-FS data'.

Command (m for help): w
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

Note 1: I set the partition type to "Non-FS data" (da), because the previously used partition type for software raids fd (Linux raid autodetect) is now deprecated.

Note 2: For larger drives (> 2 TB), you need to create a GPT partition. The DOS-partition table created with fdisk does not support large drives. I recommend to use the cfdisk command for this.

Then I added the new drive /dev/sdc into the still existing raid-1 (md3):

# mdadm /dev/md3 -a /dev/sdc1
mdadm: added /dev/sdc1

Of course this raid device now needs to rebuild:

# cat /proc/mdstat
[...]

md3 : active raid1 sdb1[3] sdc1[2]
      124968256 blocks super 1.2 [2/1] [_U]
      [>....................]  recovery =  0.6% (801792/124968256) finish=10.3min speed=200448K/sec
      bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices:

I waited until the raid was rebuilt. Of course at that moment, the raid itself still runs with the old size, because the older drive (/dev/sdb) is still a 128GB drive.

Now it's time to replace the second drive (/dev/sdb). I did the exact same steps as before with /dev/sdc, following the mdadm drive replacement guide but with the bigger partition.

Growing the mdadm raid device

Once the raid was rebuilt (once more) the raid now runs with two 224GB drives, yet the raid device is still limited to the old 128GB size. Growing/expanding a raid device is actually very easy by setting --size=max:

# mdadm --grow /dev/md3 --size=max
mdadm: component size of /dev/md3 has been set to 234372096K

This enforces a resync on the raid device:

# cat /proc/mdstat
[...]

md3 : active raid1 sdc1[2] sdb1[3]
      234372096 blocks super 1.2 [2/2] [UU]
      [===============>.....]  resync = 78.2% (183370304/234372096) finish=6.3min speed=134554K/sec
      bitmap: 1/1 pages [4KB], 131072KB chunk

unused devices:

Note the larger sizes (sectors) behind the current sync status in percent (compared to the previous rebuild). 

Growing the LVM physical volume (PV)

Once the resync was completed, the PV can now be increased:

# pvresize /dev/md3
  Physical volume "/dev/md3" changed
  1 physical volume(s) resized / 0 physical volume(s) not resized

Voilà, due to the grown PV, the volume group (VG) now has more space available:

# pvs | grep md3
  /dev/md3   vgssd    lvm2 a--  223.51g 104.34g

The additional space in the volume group can now be used to create additional logical volumes or extend existing volumes with lvextend.

Growing the file system on the device

If you do not use LVM (Logical Volume Manager) on the raid device but instead you directly use a filesystem on that raid device, you want to run resize2fs to resize the filesystem to the new size.

# resize2fs /dev/md3


Add a comment

Show form to leave a comment

Comments (newest first)

Jan Büren from Bonn, Germany wrote on Mar 18th, 2021:

Great tutorial. Just some data and addition from my current experience:

1.) If you do remove the RAID devices without setting it faulty, i.e. after reomving one raid device when the system is powered off you get a inactive raid device which has to be started (reassembled) againg.

2.) resize2fs has a moderate risk of failing according to RedHat documentation it was recommended to do a file system check first and after that a offline (umnounted) resize2fs


Goal: Increase the raid1 4TB to 14TB

Environment: ubuntu 20.04, ext4 filesystem on a raid1

Do all the steps as described by Claudio, but fdisk showed the raid type with the code 29 so I used this:
29 Linux RAID A19D880F-05FC-4D3B-A006-743F0F84911E

Furthermore my disk is larger than 2TB therefore I created a gpt table first.
Therefore the 1st new devices was created like this:

Festplattenbezeichnungstyp: gpt
Festplattenbezeichner: C9FCA2BD-E444-DD44-8CAD-E36C1FCAB676

Gerät Anfang Ende Sektoren Größe Typ
/dev/sdc1 2048 27344764894 27344762847 12,8T Linux RAID


After powering off and changing the second raid disk the raid went inactive with the state (S)
Stopping (!) the raid and reassembling it got it right again though:
mdadm --stop /dev/md20
mdadm --assemble /dev/md20

The growing of the raid devices was as a easy as described by Claudio.
I was heavily unsure about resizing the file system. Googling showed up some threads that it could take hours and even fail.


I stopped all processes which had access to the file system and umounted the device and ran a fsck like this:

e2fsck -f /dev/md20

fsck said he would like to optimize some trees and I agreed.
Fsck took about 20 minutes (4TB).
After that I did a backup of the important data of this file system.
Then I checked that no "reboot required" security upgrades where still pending in the system (unattended-upgrades is allowed to reboot at night time).

I decided to go with the online resize options because this would avoid systems downtime and the proceeding (needs kernel > 2.6) seemed stable enough.
The hdds have a througput of about 200MB/sec:

hdparm -tT /dev/sdc
Timing buffered disk reads: 668 MB in 3.00 seconds = 222.36 MB/sec

Power loss would probably be a trouble maker, using a small apc ups, you can get the battery data with apcaccess:

apcaccess
BCHARGE : 100.0 Percent
TIMELEFT : 35.7 Minutes

Ok, prepared with paranoia mode, here we go:

resize2fs /dev/md20

DONE after 5 Minutes!!!
Great.
9TB more size allocated.

I´d assume that resize2fs with the grow option is in a nutshell just a mkfs.ext for the new allocated sectors and therefore as fast as formatting the new space.

I hope this helps someone else because most of the online resources for resize2fs where quite outdated or with a negative outcome ...





ck from Switzerland wrote on Sep 28th, 2020:

Mino, yes this makes sense when your filesystem is directly running on the mdadm device (md2 in your case). If you do not use LVM on the raid device, obviously you do not need to grow the PV. I see your point now, I will add this as a note in the article. Thanks for the hint!


Mino from wrote on Sep 28th, 2020:

I was growing mdadm raid device (see name from my comment - md2). Partitions sda3 and sdb3 I extended using fdisk as in article, then after adding it to mdadm (md2) I had still old size, but "mdadm --grow /dev/md2 --size=max" from your article solved it. But command "df" still reported old small size - it was solved by "resize2fs /dev/md2"...

Filesystem at /dev/md2 is mounted on /data; on-line resizing required
old_desc_blocks = 7, new_desc_blocks = 14
Performing an on-line resize of /dev/md2 to 55905686 (4k) blocks.
The filesystem on /dev/md2 is now 55905686 blocks long.


ck from Switzerland wrote on Sep 28th, 2020:

Hi Mino. resize2fs only applies on logical volumes, not on the volume group. In this article it was about growing a software-raid (using mdadm) and then increase the physical volume (PV), which is the mdadm device. Then the volume group (VG) was increased automatically. Yes, the last step would be to increase the logical volumes. See article Online resizing of an LXC container on LVM volume for more information and example how to extend a logical volume (lvextend) and then the filesystem (resize2fs).


Mino from wrote on Sep 28th, 2020:

Hi, thanks for great article, but last one step with "pvresize" did not work for me. As first I had to install LVM2 library (because pvresize command was not found), but then "pvresize /dev/md2" caused error:

No physical volume label read from /dev/md2
Failed to read physical volume "/dev/md2"
0 physical volume(s) resized / 0 physical volume(s) not resized

So in my case I had to use "resize2fs /dev/md2" which worked well, maybe you should mention it in article what is the difference between pvresize and resize2fs and which one when should be used.


Cristian from wrote on Oct 16th, 2019:

Thanks,

works like a charm


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Icingaweb   Icingaweb2   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder   


Update cookies preferences