Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

check_ilo2_health 1.58 fixes drive check for multiple backplanes
Friday - Aug 9th 2013 - by - (1 comments)

This week a HP Proliant server detected a disk with SMART errors. Usually this is always detected by my choice of monitoring plugin for HP Proliant servers: check_ilo2_health.pl. But in this case, the Smart Error was not detected, although it appeared in the ILO xml output:

      <BACKPLANE>
           <FIRMWARE_VERSION VALUE="1.14"/>
           <ENCLOSURE_ADDR VALUE="226"/>
           <DRIVE_BAY VALUE = "5"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "6"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "7"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "8"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Smart Error"/>
               <UID_LED VALUE = "On"/>
      </BACKPLANE>

But the plugin (here version 1.56) still returned that everything was working smooth:

ILO2_HEALTH OK - No faults detected

I know from past notifications, that Smart Errors are usually detected... so it was very odd to me that on this particular server the plugin didn't recognize the Smart Error.

I contacted the plugin developer Alexander Greiner-Bär and as always he wrote back very quickly and found the issue (probably) immediately: This particular server uses more than two backplanes for hard drives. The first two backplanes identify the disks from 1-8, which makes sense, but then the second two backplanes identify the disks from 1-8 as well. To show this in a graphical way:

Backplane 1 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 2 --> Disk 5, Disk 6, Disk 7, Disk 8

Backplane 3 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 4 --> Disk 5, Disk 6, Disk 7, Disk 8

Because the disk failure happened in Backplane 2/Disk 8, the STATUS VALUE was overridden by the second time the "Disk 8" came up (Backplane4/Disk 8).

The newest version 1.58 now fixes this.

 

Add a comment

Show form to leave a comment

Comments (newest first):

Claudio from Switzerland wrote on Aug 9th, 2013:
In case you can't find version 1.58 of check_ilo2_health, here you have a mirror link.


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7487 Days
until Death of Computers
Why?