What a black weekend that was! I know it is meanwhile Thursday but it still is worth a later blog entry.
I've planned a maintenance to upgrade 4 HP Blade servers (480c) from ESX 3.5 U2 to ESX 4.0 build 171294. In the upgrade test done last Friday everything went well so it was a GO for the production servers on the weekend.
The first two servers didn't cause any problems at all. It just took a LONG time to upgrade all the VM's to Virtual Machine Version 7 afterwards.
Let me give you a hint here: Once the ESX is up again running on ESX 4.0: Start the VM's and gather information about the current network settings. Then upgrade VMware-Tools, reboot VM. After that upgrade the virtual hardware and boot the VM again. Depending on the VM it will install new hardware once logged in and will reboot (Windows VM's did that). When the VM is finally up again, VERIFIY the network settings. It is possible that they were overridden or set to DHCP..
OK, we continue...
With the next server the problems started. After a BIOS upgrade the server couldn't boot anymore. There was obviously a problem with the motherboard and the new BIOS version detected it. HP support was called and they finally delivered the new motherboard at 1am.
The next day it was time to upgrade the last server. Everything went supposedly smooth, installation and boot, everything. Once the server booted ESX 4.0 and the server was available in VirtualCenter (now called VSphere Client), I saw two VM's which were "inaccessible". These VM's were stored on a SAN datastore which was not mounted as VMFS3 anymore. I'm still asking myself why! So I added the same LUN to another ESX server which could mount the VMFS datastore, setting a new signature. When I browsed through the datastore I saw the two VM's. Yes!
But then the shock: I did the same thing on the ESX server where it was still written "VM's inaccessible". I manually mounted the VMFS datastore as in the other ESX and it didn't get mounted! It got formatted as VMFS3! My VM's ... gone... aahhh!!!! :-/
After one hour being on the phone with VMware, the only thing they could say was "sorry" and that they didn't know why this happened. I guess it had something to do that the upgrade of ESX somehow changed the LUN identification number but the installation is not supposed to do that. That's theory though.
At least now everything is OK again since, fortunately, the VM's were not THAT business critical.
What do we learn of that?You must create a backup of your VM's before upgrading to ESX 4.0!!! Definitely do that or you might experience the same black weekend as I did!