How to monitor and graph power usage (consumption) of HP Proliant servers

Written by - 0 comments

Published on - Listed in Hardware Monitoring


HP Proliant servers allow remote monitoring and integration into monitoring software, such as Nagios, Icinga or Naemon, using a couple of ways:

  • Through the SMH (system management homepage) using SNMP. This requires the HP SMH package to be installed and SMH daemons/processes running in the Operating System
  • Through the ILO (Integrated Lights-Out) interface, reading XML data via HTTPS

For many years we've been using the monitoring plugin check_ilo2_health which makes use of the second method. This allows us to get a quick oversight in case hardware problems are detected.

Note: For storage drives (hard drives or solid state drives) relying only on check_ilo2_health is not a wise idea. Read the article "Multiple ways to monitor physical hard drives with very different results" to find out why.

check_ilo2_health also outputs helpful performance data, if enabled using -d / --perfdata parameters. By parsing this performance data, historical graphs can be created. However the plugin mainly showed performance data for the different temperature sensors.

$ /usr/lib/nagios/plugins/check_ilo2_health.pl -H iloip -u admin -p pass -3 -c -o -a -d
ILO2_HEALTH OK - (Board-Version: ILO>=3) Temperatures: Temp_1 (OK): 26, Temp_2 (OK): 40, Temp_3 (OK): 40, Temp_4 (OK): 39, Temp_5 (OK): 39, Temp_6 (OK): 43, Temp_7 (OK): 42, Temp_8 (OK): 51, Temp_9 (OK): 45, Temp_10 (OK): 52, Temp_11 (OK): 43, Temp_12 (OK): 51, Temp_19 (OK): 29, Temp_20 (OK): 37, Temp_21 (OK): 37, Temp_22 (OK): 35, Temp_23 (OK): 46, Temp_24 (OK): 42, Temp_25 (OK): 41, Temp_26 (OK): 42, Temp_29 (OK): 35, Temp_30 (OK): 77 | Temp_1=26;41;45 Temp_2=40;82;83 Temp_3=40;82;83 Temp_4=39;87;92 Temp_5=39;87;92 Temp_6=43;87;92 Temp_7=42;87;92 Temp_8=51;90;95 Temp_9=45;65;70 Temp_10=52;90;95 Temp_11=43;70;75 Temp_12=51;90;95 Temp_19=29;70;75 Temp_20=37;70;75 Temp_21=37;80;85 Temp_22=35;80;85 Temp_23=46;77;82 Temp_24=42;70;75 Temp_25=41;70;75 Temp_26=42;70;75 Temp_29=35;60;65 Temp_30=77;110;115

Wouldn't it be nice to also read the current power consumption of the server?

Introducing power consumption monitoring

The good news is, that ILO's XML output also contains the current power consumption/usage of the server. It represents the same value as seen in ILO's user interface in the Power Meter (Present Power Reading).

The XML export can be seen by applying the -v parameter three times to the plugin:

$ /usr/lib/nagios/plugins/check_ilo2_health.pl -H iloip -u admin -p pass -3 -c -o -a -d -v -v -v
[...]
    <POWER_SUPPLIES>
       <POWER_SUPPLY_SUMMARY>
            <PRESENT_POWER_READING VALUE = "182 Watts"/>
            <POWER_MANAGEMENT_CONTROLLER_FIRMWARE_VERSION VALUE = "1.6"/>
            <HIGH_EFFICIENCY_MODE VALUE = "Balanced"/>
       </POWER_SUPPLY_SUMMARY>
       <SUPPLY>
            <LABEL VALUE = "Power Supply 1"/>
            <STATUS VALUE = "OK"/>
chunk: 003
chunk size: 3
       </SUPPLY>
Head:
chunk: 1ff
chunk size: 511
       <SUPPLY>
            <LABEL VALUE = "Power Supply 2"/>
            <STATUS VALUE = "OK"/>
       </SUPPLY>
    </POWER_SUPPLIES>
[...]

As check_ilo2_health uses Perl's SimpleXML module to read data from the output, we adjusted the plugin to read the value of this XML field (PRESENT_POWER_READING):

  my $powerusage=$xml->{'POWER_SUPPLIES'}[0]->{'POWER_SUPPLY_SUMMARY'}[0]->{'PRESENT_POWER_READING'}[0]->{'VALUE'};

To make this more user-friendly, we contributed to the plugin (created and maintained by Alexander Greiner-Baer) and added new options: -W / --powerusage. Using either one of these parameters will tell the plugin to output the server's current power usage (in Watt):

$ /usr/lib/nagios/plugins/check_ilo2_health.pl -H iloip -u admin -p pass -3 -c -o -a -W
ILO2_HEALTH OK - (Board-Version: ILO>=3) Power Usage: 176 Watts, Temperatures: Temp_1 (OK): 26, Temp_2 (OK): 40, Temp_3 (OK): 40, Temp_4 (OK): 39, Temp_5 (OK): 40, Temp_6 (OK): 44, Temp_7 (OK): 43, Temp_8 (OK): 51, Temp_9 (OK): 45, Temp_10 (OK): 52, Temp_11 (OK): 43, Temp_12 (OK): 51, Temp_19 (OK): 29, Temp_20 (OK): 37, Temp_21 (OK): 37, Temp_22 (OK): 36, Temp_23 (OK): 46, Temp_24 (OK): 42, Temp_25 (OK): 41, Temp_26 (OK): 42, Temp_29 (OK): 35, Temp_30 (OK): 77

In combination with -d / --perfdata, the power usage is also added to the performance data:

$ /usr/lib/nagios/plugins/check_ilo2_health.pl -H iloip -u admin -p pass -3 -c -o -a -W -d
ILO2_HEALTH OK - (Board-Version: ILO>=3) Power Usage: 166 Watts, Temperatures: Temp_1 (OK): 26, Temp_2 (OK): 40, Temp_3 (OK): 40, Temp_4 (OK): 39, Temp_5 (OK): 40, Temp_6 (OK): 44, Temp_7 (OK): 42, Temp_8 (OK): 51, Temp_9 (OK): 45, Temp_10 (OK): 53, Temp_11 (OK): 43, Temp_12 (OK): 51, Temp_19 (OK): 29, Temp_20 (OK): 37, Temp_21 (OK): 37, Temp_22 (OK): 35, Temp_23 (OK): 46, Temp_24 (OK): 42, Temp_25 (OK): 41, Temp_26 (OK): 42, Temp_29 (OK): 35, Temp_30 (OK): 76 | power=166;; Temp_1=26;41;45 Temp_2=40;82;83 Temp_3=40;82;83 Temp_4=39;87;92 Temp_5=40;87;92 Temp_6=44;87;92 Temp_7=42;87;92 Temp_8=51;90;95 Temp_9=45;65;70 Temp_10=53;90;95 Temp_11=43;70;75 Temp_12=51;90;95 Temp_19=29;70;75 Temp_20=37;70;75 Temp_21=37;80;85 Temp_22=35;80;85 Temp_23=46;77;82 Temp_24=42;70;75 Temp_25=41;70;75 Temp_26=42;70;75 Temp_29=35;60;65 Temp_30=76;110;115

Our contribution to the code was accepted by Alexander and is available in check_ilo2_health version 1.66, which was already released on Nagios Exchange.
As we are writing this article, there is no public code repository available yet, our own repository was used to document the changes between version 1.65 and 1.66. As soon as an official public repository for the plugin is available, we will update this article.

Historical Graphing

The main purpose of this added power consumption monitoring is to create historical graphs to see the power consumption over time. In our Icinga 2 monitoring, the plugin's performance data is read and entered into an InfluxDB time series database. This data is read by Grafana and voilĂ  - historical graphs of the server's power consumption are at your service:

In our case we run check_ilo2_health every 2 hours as a regular hardware check. You can see this in the way the values change in a "jumpy" way. For more fine tuning you could run the plugin more often.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder