check_ilo2_health 1.58 fixes drive check for multiple backplanes

Written by - 1 comments

Published on - Listed in Hardware Nagios Monitoring


This week a HP Proliant server detected a disk with SMART errors. Usually this is always detected by my choice of monitoring plugin for HP Proliant servers: check_ilo2_health.pl. But in this case, the Smart Error was not detected, although it appeared in the ILO xml output:

      <BACKPLANE>
           <FIRMWARE_VERSION VALUE="1.14"/>
           <ENCLOSURE_ADDR VALUE="226"/>
           <DRIVE_BAY VALUE = "5"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "6"/>
               <PRODUCT_ID VALUE = "EH0072FARWC    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "7"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Ok"/>
               <UID_LED VALUE = "Off"/>
           <DRIVE_BAY VALUE = "8"/>
               <PRODUCT_ID VALUE = "EH0072FAWJA    "/>
               <STATUS VALUE = "Smart Error"/>
               <UID_LED VALUE = "On"/>
      </BACKPLANE>

But the plugin (here version 1.56) still returned that everything was working smooth:

ILO2_HEALTH OK - No faults detected

I know from past notifications, that Smart Errors are usually detected... so it was very odd to me that on this particular server the plugin didn't recognize the Smart Error.

I contacted the plugin developer Alexander Greiner-Bär and as always he wrote back very quickly and found the issue (probably) immediately: This particular server uses more than two backplanes for hard drives. The first two backplanes identify the disks from 1-8, which makes sense, but then the second two backplanes identify the disks from 1-8 as well. To show this in a graphical way:

Backplane 1 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 2 --> Disk 5, Disk 6, Disk 7, Disk 8

Backplane 3 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 4 --> Disk 5, Disk 6, Disk 7, Disk 8

Because the disk failure happened in Backplane 2/Disk 8, the STATUS VALUE was overridden by the second time the "Disk 8" came up (Backplane4/Disk 8).

The newest version 1.58 now fixes this.


Add a comment

Show form to leave a comment

Comments (newest first)

Claudio from Switzerland wrote on Aug 9th, 2013:

In case you can't find version 1.58 of check_ilo2_health, here you have a mirror link.


RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder