check_ilo2_health 1.58 fixes drive check for multiple backplanes

Written by - 1 comments

Published on - Listed in Hardware Nagios Monitoring


This week a HP Proliant server detected a disk with SMART errors. Usually this is always detected by my choice of monitoring plugin for HP Proliant servers: check_ilo2_health.pl. But in this case, the Smart Error was not detected, although it appeared in the ILO xml output:

      <BACKPLANE>
           <FIRMWARE_VERSION VALUE="1.14"/>
           <ENCLOSURE_ADDR VALUE="226"/>
           <DRIVE_BAY VALUE = "5"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "6"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "7"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "8"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Smart Error"/>
               <UID_LED VALUE = "On"/>
      </BACKPLANE>

But the plugin (here version 1.56) still returned that everything was working smooth:

ILO2_HEALTH OK - No faults detected

I know from past notifications, that Smart Errors are usually detected... so it was very odd to me that on this particular server the plugin didn't recognize the Smart Error.

I contacted the plugin developer Alexander Greiner-Bär and as always he wrote back very quickly and found the issue (probably) immediately: This particular server uses more than two backplanes for hard drives. The first two backplanes identify the disks from 1-8, which makes sense, but then the second two backplanes identify the disks from 1-8 as well. To show this in a graphical way:

Backplane 1 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 2 --> Disk 5, Disk 6, Disk 7, Disk 8

Backplane 3 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 4 --> Disk 5, Disk 6, Disk 7, Disk 8

Because the disk failure happened in Backplane 2/Disk 8, the STATUS VALUE was overridden by the second time the "Disk 8" came up (Backplane4/Disk 8).

The newest version 1.58 now fixes this.


Add a comment

Show form to leave a comment

Comments (newest first)

Claudio from Switzerland wrote on Aug 9th, 2013:

In case you can't find version 1.58 of check_ilo2_health, here you have a mirror link.