grub2 boot issues after hdd replacement

Written by - 0 comments

Published on - Listed in Linux Shell Hardware Rant

A couple of weeks ago, I ran into a very strange and at first sight complicated problem. A physical server, running with Debian Squeeze and Software Raid, didn't start up anymore after a reboot. The troubleshooting was much more complicated too, because I didn't have a console access to this server - so I was kind of doing blind troubleshooting.

First I thought (and hoped), that fsck is probably still running as the server still wasn't up. I gave it adequate time before I contacted someone in the data center to physically take a look at the console. Then the answer from the data center guy came back: The server doesn't boot and hangs on the grub boot screen. Oh golly...

I booted the server into a rescue mode with SSH activated so I could at least take a look at the current grub configuration. I've already ran into grub2 issues in the past (see Kernel upgrade problem on Debian Squeeze) so I made my connaissance with the "" file. I mounted the boot file system and took a look at it:

root@rescue /mnt/boot/grub # cat
(hd0)   /dev/disk/by-id/ata-ST3000DM001-9YN166_S1F08LYF
(hd1)   /dev/disk/by-id/ata-ST3000DM001-9YN166_S1F03NRC

These entries mean that grub2 looks for these disks to boot on. I remembered that a couple of weeks ago I replaced a defect HDD - and that one of these entries are probably still from the old HDD. So I needed to replace the entries with new entries. I decided to completely remove the grub2 bootloader and reinstall it, to make sure, grub is also written to the first sectors of the disks:

root@rescue ~ # mkdir /mnt/rescue
root@rescue ~ # mkdir /mnt/rescue/boot
root@rescue ~ # mount /dev/vg0/root /mnt/rescue/
root@rescue ~ # mount /dev/md1 /mnt/rescue/boot
root@rescue ~ # mount /dev/vg0/var /mnt/rescue/var
root@rescue ~ # mount --bind /dev /mnt/rescue/dev/
root@rescue ~ # mount --bind /proc /mnt/rescue/proc/
root@rescue ~ # mount --bind /sys /mnt/rescue/sys/
root@rescue ~ # chroot /mnt/rescue /bin/bash

If you wonder, why I mounted the var file system: This is needed if one wants to use apt-get. And that's what I did:

root@rescue / # apt-get remove grub; apt-get purge grub; apt-get install grub

It was necessary to use "purge", otherwise some of the grub config files were still hanging around... After I answered the install questions (I installed grub on /dev/sda, /dev/sdb and /dev/md1 which, as you see, was my boot file system), I checked the file again:

root@rescue / # cat /boot/grub/
(hd0)   /dev/disk/by-id/scsi-35000c5005271765b
(hd1)   /dev/disk/by-id/scsi-35000c5004a2aa7ce

After these changes, the system booted again.

But how could this happen? I investigated on another, pretty similar server, which also had a recent disk replacement. The also contained one old HDD entry so I ran update-grub, to update the grub configuration:


But there were no changes made to the file; the old HDD entry still existed. If I were to reboot that server, it probably wouldn't start up anymore, too!

I continued some tests and got aware that if I removed and _then_ launched update-grub, the file was created by update-grub and the entries were _now_ correct.
So if you replace a HDD, make sure you delete the /boot/ file before launching the update-grub command!

Shortly after this discovery, I filed a Debian bug report, which can be seen here: grub-update does not update when hdd was replaced. Hopefully this bug will be fixed soon - or was already fixed as Debian Squeeze uses grub2 package version 1.98 and Wheezy uses 1.99.

Update March 5th 2013: I had a similar issue today when I just updated a Debian Squeeze with the latest patches and also a Kernel upgrade. The update itself went through without any error, but at the reboot, grub didn't correctly start up. Besides the steps mentioned in this post, I additionally had to manually reinstall grub on the disks:

grub-install /dev/sda; grub-install /dev/sdb

I've had several boot issues after Debian updates (not even distro upgrade!) now... I'm kind of getting scared :-/

Update February 2nd 2014:
Wow - one year later and I've had a similar experience on Debian Wheezy. See Debian not booting: ALERT /dev/disk/by-uuid does not exist.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.