check_ilorest: A new hardware monitoring plugin for HPE ProLiant servers

Written by - 0 comments

Published on - Listed in Monitoring Observability Hardware


After several months of collaboration, development, internal testings and production monitoring rollout together with Swisscom, I am more than happy to announce a new monitoring plugin: check_ilorest.

check_ilorest is an open source monitoring plugin with the purpose to monitor the hardware health of HPE ProLiant servers. The Python script uses the ilorest command in the background (hence the name), which is an open source project from HPE. check_ilorest can be run locally on a HPE server to obtain the current health status of the hardware.

The check_ilorest monitoring plugin can be found on the public GitHub repository. Contributions and feedback are of course welcome.

check_ilorest fills the monitoring gap

If you are a long-time user of HP/HPE servers, you may remember the server monitoring using SNMP and the overlaying HP Insight Management Agents (particularly HP System Management Homepage, SMH in short). The HP software contained multiple packages and needed to be installed on the OS. Several agents/daemons were then started, listening on specific ports. In the monitoring world, Gerhard Lausser's Nagios plugin check_hpasm connected to these agents and informed very nicely about the current hardware health of each HP server.

Fast forward to 2025. The mentioned HP packages (HP Management Agents, System Management Homepage) no longer exist. Some (now historic) pages might still exist and link to non-existing pages.

But wait, there's ILO! As a matter of fact, ILO (Integrated Lights Out) allows to be queried on its dedicated network interface. This also includes getting the health status of all discovered hardware elements. Alexander Greiner-Baer's plugin check_ilo2_health was made for this purpose and queries the ILO XMLRPC API. This is working very well and I personally have been using this monitoring plugin successfully for hundreds of HPE servers in the past. The problem with this way of hardware monitoring? The monitoring server(s) need access to the ILO network interface. In high secure environments, the ILO management cards are often connected to non-routed and highly secured out-of-band management networks. The access to an ILO NIC is not always possible, due to technical reasons or security policies.

And this is where check_ilorest comes into play. check_ilorest can be executed locally, inside the installed Operating System, on the HPE server. There is no need to connect to the ILO network interface - this method is called "in-band management". In this situation, ilorest accesses the BMC (Baseboard Management Controller) of the server and uses a virtual communication path between the OS and ILO of the server.

With check_ilorest you have the ease of use as you can execute the plugin directly on the HPE server you want to monitor without the need of installing and running additional agents. Yet you obtain detailed and up to date information directly from ILO.

Ready for Prometheus (metrics exposition)

check_ilorest was developed with both classic and modern monitoring and observability stacks in mind.

The plugin can be executed as a classic monitoring plugin and integrated into monitoring tools such as Nagios or Icinga:

root@mintp:~# /usr/lib/nagios/plugins/check_ilorest.py
ILOREST HARDWARE OK: Hardware is healthy. Server: HPE ProLiant DL380 Gen11, S/N: XXXXXXXXXX, System BIOS: U54 v2.16 (03/01/2024) []

But the plugin also supports an output in the Prometheus metrics exposition format (-o prometheus). Each detected hardware element is therefore shown as a metric:

root@mintp:~# /usr/lib/nagios/plugins/check_ilorest.py -o prometheus
#HELP ilorest_hardware_health Overall health status of server. 0=OK, 1=Warning, 2=Critical
#TYPE ilorest_hardware_health gauge
ilorest_hardware_health 0
#HELP ilorest_hardware_health_biosorhardwarehealth Health status of hardware component BiosOrHardwareHealth
#TYPE ilorest_hardware_health_biosorhardwarehealth gauge
ilorest_hardware_health_biosorhardwarehealth 0
#HELP ilorest_hardware_health_fans Health status of hardware component Fans
#TYPE ilorest_hardware_health_fans gauge
ilorest_hardware_health_fans 0
#HELP ilorest_hardware_health_memory Health status of hardware component Memory
#TYPE ilorest_hardware_health_memory gauge
ilorest_hardware_health_memory 0
#HELP ilorest_hardware_health_network Health status of hardware component Network
#TYPE ilorest_hardware_health_network gauge
ilorest_hardware_health_network 0
#HELP ilorest_hardware_health_powersupplies Health status of hardware component PowerSupplies
#TYPE ilorest_hardware_health_powersupplies gauge
ilorest_hardware_health_powersupplies 0
#HELP ilorest_hardware_health_processors Health status of hardware component Processors
#TYPE ilorest_hardware_health_processors gauge
ilorest_hardware_health_processors 0
#HELP ilorest_hardware_health_smartstoragebattery Health status of hardware component SmartStorageBattery
#TYPE ilorest_hardware_health_smartstoragebattery gauge
ilorest_hardware_health_smartstoragebattery 0
#HELP ilorest_hardware_health_storage Health status of hardware component Storage
#TYPE ilorest_hardware_health_storage gauge
ilorest_hardware_health_storage 0
#HELP ilorest_hardware_health_temperatures Health status of hardware component Temperatures
#TYPE ilorest_hardware_health_temperatures gauge
ilorest_hardware_health_temperatures 0

In combination with the Prometheus Script Exporter you can run this plugin and scrape the plugin's output via Script Exporter's API, using a scraper such as Prometheus, Grafana Alloy or OpenTelemetry Collector. With these metrics you can create fancy dashboards, representing the hardware health of your HPE server(s).

Thanks and acknowledgements

This monitoring plugin was developed at and for Swisscom. Therefore I'd like to send my thanks and congratulations to Swisscom for their decision to open source this monitoring plugin and making it publicly available. Special thanks go to Iva (Product Owner) and Tom (Product Manager) of the Continuous Monitoring and Observability team. 

Making the GitHub repository public


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PHP   Perl   Personal   PostgreSQL   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder    Linux