This week a HP Proliant server detected a disk with SMART errors. Usually this is always detected by my choice of monitoring plugin for HP Proliant servers: check_ilo2_health.pl. But in this case, the Smart Error was not detected, although it appeared in the ILO xml output:
<BACKPLANE>
<FIRMWARE_VERSION VALUE="1.14"/>
<ENCLOSURE_ADDR VALUE="226"/>
<DRIVE_BAY VALUE = "5"/>
<PRODUCT_ID VALUE = "EH0072FARWC "/>
<STATUS VALUE = "Ok"/>
<UID_LED VALUE = "Off"/>
<DRIVE_BAY VALUE = "6"/>
<PRODUCT_ID VALUE = "EH0072FARWC "/>
<STATUS VALUE = "Ok"/>
<UID_LED VALUE = "Off"/>
<DRIVE_BAY VALUE = "7"/>
<PRODUCT_ID VALUE = "EH0072FAWJA "/>
<STATUS VALUE = "Ok"/>
<UID_LED VALUE = "Off"/>
<DRIVE_BAY VALUE = "8"/>
<PRODUCT_ID VALUE = "EH0072FAWJA "/>
<STATUS VALUE = "Smart Error"/>
<UID_LED VALUE = "On"/>
</BACKPLANE>
But the plugin (here version 1.56) still returned that everything was working smooth:
ILO2_HEALTH OK - No faults detected
I know from past notifications, that Smart Errors are usually detected... so it was very odd to me that on this particular server the plugin didn't recognize the Smart Error.
I contacted the plugin developer Alexander Greiner-Bär and as always he wrote back very quickly and found the issue (probably) immediately: This particular server uses more than two backplanes for hard drives. The first two backplanes identify the disks from 1-8, which makes sense, but then the second two backplanes identify the disks from 1-8 as well. To show this in a graphical way:
Backplane 1 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 2 --> Disk 5, Disk 6, Disk 7, Disk 8
Backplane 3 --> Disk 1, Disk 2, Disk 3, Disk 4
Backplane 4 --> Disk 5, Disk 6, Disk 7, Disk 8
Because the disk failure happened in Backplane 2/Disk 8, the STATUS VALUE was overridden by the second time the "Disk 8" came up (Backplane4/Disk 8).
The newest version 1.58 now fixes this.
Claudio from Switzerland wrote on Aug 9th, 2013:
In case you can't find version 1.58 of check_ilo2_health, here you have a mirror link.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder