After several months of collaboration, development, internal testings and production monitoring rollout together with Swisscom, I am more than happy to announce a new monitoring plugin: check_ilorest.
check_ilorest is an open source monitoring plugin with the purpose to monitor the hardware health of HPE ProLiant servers. The Python script uses the ilorest command in the background (hence the name), which is an open source project from HPE. check_ilorest can be run locally on a HPE server to obtain the current health status of the hardware.
The check_ilorest monitoring plugin can be found on the public GitHub repository. Contributions and feedback are of course welcome.
If you are a long-time user of HP/HPE servers, you may remember the server monitoring using SNMP and the overlaying HP Insight Management Agents (particularly HP System Management Homepage, SMH in short). The HP software contained multiple packages and needed to be installed on the OS. Several agents/daemons were then started, listening on specific ports. In the monitoring world, Gerhard Lausser's Nagios plugin check_hpasm connected to these agents and informed very nicely about the current hardware health of each HP server.
Fast forward to 2025. The mentioned HP packages (HP Management Agents, System Management Homepage) no longer exist. Some (now historic) pages might still exist and link to non-existing pages.
But wait, there's ILO! As a matter of fact, ILO (Integrated Lights Out) allows to be queried on its dedicated network interface. This also includes getting the health status of all discovered hardware elements. Alexander Greiner-Baer's plugin check_ilo2_health was made for this purpose and queries the ILO XMLRPC API. This is working very well and I personally have been using this monitoring plugin successfully for hundreds of HPE servers in the past. The problem with this way of hardware monitoring? The monitoring server(s) need access to the ILO network interface. In high secure environments, the ILO management cards are often connected to non-routed and highly secured out-of-band management networks. The access to an ILO NIC is not always possible, due to technical reasons or security policies.
And this is where check_ilorest comes into play. check_ilorest can be executed locally, inside the installed Operating System, on the HPE server. There is no need to connect to the ILO network interface - this method is called "in-band management". In this situation, ilorest accesses the BMC (Baseboard Management Controller) of the server and uses a virtual communication path between the OS and ILO of the server.
With check_ilorest you have the ease of use as you can execute the plugin directly on the HPE server you want to monitor without the need of installing and running additional agents. Yet you obtain detailed and up to date information directly from ILO.
check_ilorest was developed with both classic and modern monitoring and observability stacks in mind.
The plugin can be executed as a classic monitoring plugin and integrated into monitoring tools such as Nagios or Icinga:
root@mintp:~# /usr/lib/nagios/plugins/check_ilorest.py
ILOREST HARDWARE OK: Hardware is healthy. Server: HPE ProLiant DL380 Gen11, S/N: XXXXXXXXXX, System BIOS: U54 v2.16 (03/01/2024) []
But the plugin also supports an output in the Prometheus metrics exposition format (-o prometheus). Each detected hardware element is therefore shown as a metric:
root@mintp:~# /usr/lib/nagios/plugins/check_ilorest.py -o prometheus
#HELP ilorest_hardware_health Overall health status of server. 0=OK, 1=Warning, 2=Critical
#TYPE ilorest_hardware_health gauge
ilorest_hardware_health 0
#HELP ilorest_hardware_health_biosorhardwarehealth Health status of hardware component BiosOrHardwareHealth
#TYPE ilorest_hardware_health_biosorhardwarehealth gauge
ilorest_hardware_health_biosorhardwarehealth 0
#HELP ilorest_hardware_health_fans Health status of hardware component Fans
#TYPE ilorest_hardware_health_fans gauge
ilorest_hardware_health_fans 0
#HELP ilorest_hardware_health_memory Health status of hardware component Memory
#TYPE ilorest_hardware_health_memory gauge
ilorest_hardware_health_memory 0
#HELP ilorest_hardware_health_network Health status of hardware component Network
#TYPE ilorest_hardware_health_network gauge
ilorest_hardware_health_network 0
#HELP ilorest_hardware_health_powersupplies Health status of hardware component PowerSupplies
#TYPE ilorest_hardware_health_powersupplies gauge
ilorest_hardware_health_powersupplies 0
#HELP ilorest_hardware_health_processors Health status of hardware component Processors
#TYPE ilorest_hardware_health_processors gauge
ilorest_hardware_health_processors 0
#HELP ilorest_hardware_health_smartstoragebattery Health status of hardware component SmartStorageBattery
#TYPE ilorest_hardware_health_smartstoragebattery gauge
ilorest_hardware_health_smartstoragebattery 0
#HELP ilorest_hardware_health_storage Health status of hardware component Storage
#TYPE ilorest_hardware_health_storage gauge
ilorest_hardware_health_storage 0
#HELP ilorest_hardware_health_temperatures Health status of hardware component Temperatures
#TYPE ilorest_hardware_health_temperatures gauge
ilorest_hardware_health_temperatures 0
In combination with the Prometheus Script Exporter you can run this plugin and scrape the plugin's output via Script Exporter's API, using a scraper such as Prometheus, Grafana Alloy or OpenTelemetry Collector. With these metrics you can create fancy dashboards, representing the hardware health of your HPE server(s).
This monitoring plugin was developed at and for Swisscom. Therefore I'd like to send my thanks and congratulations to Swisscom for their decision to open source this monitoring plugin and making it publicly available. Special thanks go to Iva (Product Owner) and Tom (Product Manager) of the Continuous Monitoring and Observability team.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder Linux