check_ilorest 1.2.1 adds power usage metrics for HPE ProLiant servers

Written by - 0 comments

Published on - Listed in Monitoring Observability Hardware Grafana


As previously announced, check_ilorest is a modern monitoring plugin for classic monitoring systems such as Icinga or Nagios, but also supports telemetry metrics output in the Prometheus format. As I like to call it, this is a "hybrid" monitoring plugin.

Version 1.2.1 of check_ilorest released 

Today a new version (1.2.1) of check_ilorest was released on the GitHub repository. The changes are:

  • The execution of the ilorest command in the background has moved to a separate function inside the Python script. This should allow a better use for additional checks which require an additional execution of ilorest. Such additional checks might be added in the future. It also saves space in the code, as the function helps for recurring operations.
  • The --power argument was added. This enhances the monitoring plugin with additional metrics related to the server's power usage. --power must be used in combination with -m / --metrics.
  • Bugfix in the label naming of the newly added power supply metrics.

Show me your power!

As energy prices are through the roof (especially in Europe), it makes sense to have an extra eye on the power consumption of your HPE ProLiant servers. This is where the newly added --power argument helps you. The metrics from the power supply(ies) are retrieved as well as the total power consumption of the server.

In combination with the already existing -m / --metrics parameter, check_ilorest now shows the current power consumption of the whole server, as well of each detected power supply. 

Here's the plugin showing the server's power usage in Nagios performance data style:

root@proliant ~ # ./check_ilorest.py -o nagios -m --power
ILOREST HARDWARE OK: Hardware is healthy. Server: HPE ProLiant DL380 Gen11, S/N: XXXXXXXXXX, System BIOS: U54 v2.16 (03/01/2024) | AvgCPU0Freq=66;;;; AvgCPU1Freq=112;;;; CPU0Power=78;;;; CPU1Power=75;;;; CPUICUtil=0;;;0;100 CPUUtil=0;;;0;100 IOBusUtil=0;;;0;100 JitterCount=0;;;; MemoryBusUtil=0;;;0;100 PowerUsage=327;;;; PS1Usage=128;;;; PS2Usage=199;;;;

You can spot the PowerUsage and the per power supply metrics (PS1Usage and PS2Usage) at the end of the Nagios performance data output.

When the Prometheus metrics exposition format is used (-o prometheus), the power related metrics can be found at the bottom:

root@proliant ~ # ./check_ilorest.py -o prometheus -m --power
#HELP ilorest_hardware_health Overall health status of server. 0=OK, 1=Warning, 2=Critical
#TYPE ilorest_hardware_health gauge
ilorest_hardware_health 0
#HELP ilorest_hardware_health_biosorhardwarehealth Health status of hardware component BiosOrHardwareHealth
#TYPE ilorest_hardware_health_biosorhardwarehealth gauge
ilorest_hardware_health_biosorhardwarehealth 0
#HELP ilorest_hardware_health_fans Health status of hardware component Fans
#TYPE ilorest_hardware_health_fans gauge
ilorest_hardware_health_fans 0
#HELP ilorest_hardware_health_memory Health status of hardware component Memory
#TYPE ilorest_hardware_health_memory gauge
ilorest_hardware_health_memory 0
#HELP ilorest_hardware_health_network Health status of hardware component Network
#TYPE ilorest_hardware_health_network gauge
ilorest_hardware_health_network 0
#HELP ilorest_hardware_health_powersupplies Health status of hardware component PowerSupplies
#TYPE ilorest_hardware_health_powersupplies gauge
ilorest_hardware_health_powersupplies 0
#HELP ilorest_hardware_health_processors Health status of hardware component Processors
#TYPE ilorest_hardware_health_processors gauge
ilorest_hardware_health_processors 0
#HELP ilorest_hardware_health_smartstoragebattery Health status of hardware component SmartStorageBattery
#TYPE ilorest_hardware_health_smartstoragebattery gauge
ilorest_hardware_health_smartstoragebattery 0
#HELP ilorest_hardware_health_storage Health status of hardware component Storage
#TYPE ilorest_hardware_health_storage gauge
ilorest_hardware_health_storage 0
#HELP ilorest_hardware_health_temperatures Health status of hardware component Temperatures
#TYPE ilorest_hardware_health_temperatures gauge
ilorest_hardware_health_temperatures 0
#HELP ilorest_hardware_AvgCPU0Freq
#TYPE ilorest_hardware_AvgCPU0Freq gauge
ilorest_hardware_AvgCPU0Freq 89
#HELP ilorest_hardware_AvgCPU1Freq
#TYPE ilorest_hardware_AvgCPU1Freq gauge
ilorest_hardware_AvgCPU1Freq 97
#HELP ilorest_hardware_CPU0Power
#TYPE ilorest_hardware_CPU0Power gauge
ilorest_hardware_CPU0Power 75
#HELP ilorest_hardware_CPU1Power
#TYPE ilorest_hardware_CPU1Power gauge
ilorest_hardware_CPU1Power 73
#HELP ilorest_hardware_CPUICUtil
#TYPE ilorest_hardware_CPUICUtil gauge
ilorest_hardware_CPUICUtil 0
#HELP ilorest_hardware_CPUUtil
#TYPE ilorest_hardware_CPUUtil gauge
ilorest_hardware_CPUUtil 0
#HELP ilorest_hardware_IOBusUtil
#TYPE ilorest_hardware_IOBusUtil gauge
ilorest_hardware_IOBusUtil 0
#HELP ilorest_hardware_JitterCount
#TYPE ilorest_hardware_JitterCount gauge
ilorest_hardware_JitterCount 0
#HELP ilorest_hardware_MemoryBusUtil
#TYPE ilorest_hardware_MemoryBusUtil gauge
ilorest_hardware_MemoryBusUtil 0
#HELP ilorest_hardware_powerusage
#TYPE ilorest_hardware_powerusage gauge
ilorest_hardware_powerusage 327
#HELP ilorest_hardware_power_supply_usage
#TYPE ilorest_hardware_power_supply_usage gauge
ilorest_hardware_power_supply_usage{psu="1"} 127
ilorest_hardware_power_supply_usage{psu="2"} 200

The metric names are:

  • ilorest_hardware_powerusage (current total server consumption)
  • ilorest_hardware_power_supply_usage (each detected PSU represented as 'psu' label) 

Why is this cool?

Because, using these added metrics, your dashboard can now show the live and historical power consumption of the HPE ProLiant server!

Power usage of HPE ProLiant server in Grafana Dashboard

Thinking further, you can now create a visualization to show and compare the power consumption of all your ProLiant servers. You could use this for identifying misbehaving hardware or over-used systems. But also FinOps will have their joy with such metrics, as they could now calculate the (live) power costs of each server.

The metrics are there, use them for your use-case. 


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Observability   Office   OpenSearch   PHP   Perl   Personal   PostgreSQL   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder