It has been a while since my last angry entry about HP Blade Servers, but now I have another reason. This weekend I planned a small maintenance on an HP Blade 480c server. Mission: Simply replace a memory DIMM which has been detected as degraded by System Management Homepage and also our Nagios monitoring (which uses the same SNMP info as SMH).
I received the new memory DIMM a while ago, now having planned the maintenance the defected one was ready to replace. Shutdown of the server, replace defected memory by new one, boot server. FAIL. As soon as I inserted the blade back into the chassis, it wasn't detected by the chassis anymore and the power led was blinking red. Having had enough experience with failing HP blades, I knew the motherboard needs to be replaced.
As always, opened a new support case (luckily we have a 4h contract, otherwise the normal support doesn't work on weekends) and the new motherboard and the technician arrived 3.5 hours later. Replaced motherboard, re-insert blade into chassis, boot. FAIL. Now what?! Tried it with the minimal memory configuration (2 memory DIMMs) and now the server booted successfully. Had to shut down and boot again until all the memory was installed again (added 2 DIMMs per boot).
Now some people might say 'this can happen, it is hardware'. Yes and no. This is why I'm upset:
- The server indicated for a long time memory errors (and only that). After already having replaced a DIMM, a new memory error appeared just after having replaced the so-called defected one (first DIMM#6 and now DIMM#5). No motherboard errors or warnings were shown.
- When I opened the support case to replace the defected memory, I told HP that I am pretty sure it might be a motherboard and not a memory problem. But HP still didn't want to send me a new motherboard but only a new memory DIMM. This would have prevented the crash on Saturday.
- The replacement piece (the new motherboard) was delivered very fast. Respect! But the HP-technician came almost one hour later than planned. He told me he couldn't find a parking space...
- It was a Saturday morning and instead of doing the planned maintenance (30min) I stayed in the data center for around 6hours.