check_esxi_hardware

Last update: March 11, 2019

This script is a Nagios/Monitoring plugin to monitor the hardware of ESX and ESXi servers. It queries the CIM (Common Information Model) server running on the ESXi server to retrieve the current status of all discovered hardware parts. The plugin can also be used as standalone script to check the hardware - Nagios or Icinga isn't necessary to run it. The plugin is written in python and uses the pywbem module. See Requirements for more information.

Download

Download check_esxi_hardware.py

check_esxi_hardware.py

146207 downloads so far...

Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).

Community contributions welcome on GitHub repo.

Version history / Changelog

20080820 Initial release by David Ligeret
20080821 Add verbose mode by David Ligeret
20090219 Add try/except to catch AuthError and CIMError by Joshua Daniel Franklin
20100202 Added HP Support (HealthState) by Branden Schneider
20100512 Combined different versions (Joshua and Branden) and added hardware type switch
20100628 Outputs server model, s/n and bios version and set Unknown as default exit code by Samir Ibradzic
20100702 GlobalStatus was incorrectly getting (re)set to OK with every CIM element check by Aaron Rogers
20100705 After last version all Dell servers return UNKNOWN instead of OK, added Aaron's logic for Dell checks as well
20101028 Changed text in Usage and Example so people dont forget to use https://
20110110 If Dell Blade Servers were used, Serial Number of Chassis instead of Blade was returned - by Ludovic Hutin
20110207 Bugfix/new feature for Intel server systems by Carsten Schoene
20110215 Plugin now catches Socket Error (Timeout Error) and added a timeout parameter by Ludovic Hutin
20110221 Removed recently added timeout parameter due to incompatibility on Windows systems
20110221 Changed plugin name from check_esxi_wbem.py to check_esxi_hardware.py
20110426 Added 'ibm' hardware type (compatible to Dell output). Tested by Keith Erekson on an IBM x3550
20110503 Plugin rewritten, added automatic hardware detection, opt params, perfdata and much more by Phil Randal
20110504 Some minor code changes, removed typo, bugfix for voltage sensors on IBM server by Phil Randal
20110505 Added possibility to use first line of a file as password (file:) by Fredrik Åslund
20110507 A lot of bugfixes and enhancements from Phil Randal (see changelog in plugin for details)
20110520 Bugfix for IBM Blade Servers by Bertrand Jomin
20110614 Rewrote external file handling, file can now be used for password AND username
20111003 Added ignore option to ignore certain elements by Ian Chard
20120402 Making plugin GPL compatible (Copyright) and preparing for OpenBSD port
20120405 Fix lookup of warranty info for Dell by Phil Randal
20120501 Bugfix in manufacturer discovery when cim entry not found or empty by Craig Hart
20121027 Workaround for Dell PE x620 for Riser Config Err 0: Connected element (wrong return code)
20130424 Another workaround for Dell systems "System Board 1 LCD Cable Pres 0: Connected"
20130702 Improving wrong authentication timeout and exit UNKNOWN by Carl R. Friend
20130725 Fix lookup of warranty info for Dell by Phil Randal
20140319 Another workaround for Dell systems "System Board 1 VGA Cable Pres 0: Connected"
20150109 Output serial number of chassis if a blade server is checked
20150119 Fix NoneType element bug by Andreas Gottwald
20150626 Added support for patched pywbem 0.7.0 and new version 0.8.0, handle SSL error exception
20150710 Exit Unknown instead of Critical for timeouts and auth errors by Stanislav German-Evtushenko
20151111 Cleanup and define variables by Stefan Roos
20160411 Distinguish between/add support for minor versions of pywbem 0.7 and 0.8
20160531 Add parameter for variable CIM port (useful when behind NAT)
20161013 Added support for pywbem 0.9.x (and upcoming releases)
20170905 Added option to ignore LCD/Display related elements (--no-lcd)
20180329 Try to use internal pywbem function to determine version
20180411 Throw an unknown if we can't fetch the data for some reason by Peter Newman
20181001 python3 compatibility

Frequently Asked Questions (FAQ)

The FAQ have grown quite a lot and in order to support questions and comments, there is now a dedicated page: Click here to go to the FAQ page.

Requirements

  • Python must be installed (both Python2 and Python3 are supported)
  • The Python extension pywbem must be installed
  • If there is a firewall between your monitoring and ESXi server, open tcp port 5989 (or the port you define with -C)

How to install PyWBEM

check_esxi_hardware.py uses the functions of the python module PyWBEM. It is therefore mandatory to install this module. Most distributions already offer pywbem as a package.

Windows users click here for a (old) step-by-step guide how to install Python and PyWBEM on a Windows server.

DEB based installation (Debian, Ubuntu, Linux Mint, ...):

apt-get install python-pywbem

YUM based installation (RedHat, CentOS, Fedora, ...):

yum install pywbem

Zypper based installation (SuSE):

zypper install python-pywbem

The python way using pip (platform independant):

pip install pywbem

pip3 install pywbem

Definition of the parameters

Parameter Short Parameter Long Description
-H --host Hostname or IP address of ESX/ESXi server
-U --user Username to check (must be a local user on target host)
Note: If you don't want to use your root user, use this workaround
Use file:/path/to/.file to use first string as username
-P --password Password for given user
Use file:/path/to/.file to use second string as password - this won't show the password in servers process list
-C --cimport CIM port (default 5989)
-V --vendor (hw_type) Define the vendor of the server: auto, dell, hp, ibm, intel, unknown (default)
auto: Plugin tries to determine hardware itself by using CIM entries
unknown: If no hw/vendor type was given, unknown will be used (like auto)
-i --ignore Ignore given list (comma separated) of elements
-v --verbose Verbose/detailled output for debugging
-p --perfdata Show performance data to create graphs (mainly temperature sensors and fan rpm)
-I --html Add web-links to hardware manuals for Dell servers (use your country extension)
-t --timeout Timeout in seconds
Note: Some server models take a long time to display all results. Use this parameter accordingly.
N/A --no-power Do not collect power performance data
N/A --no-volts Do not collect voltage performance data
N/A --no-current Do not collect current performance data
N/A --no-temp Do not collect temperature performance data
N/A --no-fan Do not collect fan performance data
N/A --no-lcd Do not collect LCD/front display status data

File handling for user and password

Since version 20110505 it is possible to use a file as password-source. The string in the given file is used as password.

Since version 20110614 the file can be used for both username and password.

This enhances security! If no file is used, the username and password credentials will be shown in cleartext in the processlist when the plugin is executed.

Please watch out for the correct usage!

Example 1: You want to use a file (/home/nagios/.esxipass) which contains username and password. Note that two strings are separated by a space:

# cat /home/nagios/.esxipass
root mypass123

# ./check_esxi_hardware.py -H 172.17.16.131 -U file:/home/nagios/.esxipass -P file:/home/nagios/.esxipass -V dell

Example 2: You only want to use a file for the password. Note that there is only one string in the file:

# cat /home/nagios/.esxipass
mypass123

# ./check_esxi_hardware.py -H 172.17.16.131 -U root -P file:/home/nagios/.esxipass -V dell

Usage / running the plugin on the command line

Usage:

./check_esxi_hardware.py -H esxi-server-ip -U username -P mypass [-C -V -i -v -p -I xx]

./check_esxi_hardware.py --host esxi-server-ip --user username --password mypass [--cimport --vendor --ignore --verbose --perfdata --html xx]

./check_esxi_hardware.py -H esxi-server-ip -U -P file:/path/to/.passwdfile [--vendor --ignore --verbose --perfdata --html xx]

./check_esxi_hardware.py -H esxi-server-ip -U file:/path/to/.passwdfile -P file:/path/to/.passwdfile [--vendor -i -v -p --html xx]

Examples:

./check_esxi_hardware.py -H 10.0.0.200 -U root -P mypass -V dell -p -I de
./check_esxi_hardware.py --host esxiserver1 --user root --password mypass --vendor hp --perfdata
./check_esxi_hardware.py --host esxiserver2 --user root --password mypass --vendor dell --html us
./check_esxi_hardware.py -H esxiserver1 -U root -P file:/root/.esxipass -V dell
./check_esxi_hardware.py -H esxiserver1 -U file:/root/.esxipass -P file:/root/.esxipass -V dell
./check_esxi_hardware.py -H esxiserver1 -U root -P mypass -V dell -i "IPMI SEL"

Command definition

Command definition in Nagios, Icinga 1.x, Shinken, Naemon

# 'check_esxi_hardware' command definition (basic)
define command{
  command_name check_esxi_hardware
  command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$
}

# 'check_esxi_hardware' command definition (with appended optional parameter)
define command{
  command_name check_esxi_hardware
  command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -V $ARG3$ $ARG4$
}

Command definition in Icinga 2.x

The command definition in Icinga 2.x is already prepared because check_esxi_hardware is part of the ITL plugins. See https://www.icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#esxi_hardware. For the sake of completeness:

object CheckCommand "esxi_hardware" {
  import "plugin-check-command"

  command = [ PluginContribDir + "/check_esxi_hardware.py" ]

  arguments = {
    "-H" = {
      value = "$esxi_hardware_host$"
      description = "report on HOST"
    }
    "-U" = {
      value = "$esxi_hardware_user$"
      description = "user to connect as"
    }
    "-P" = {
      value = "$esxi_hardware_pass$"
      description = "password"
    }
    "-C" = {
      value = "$esxi_hardware_port$"
      description = "cim port"
    }
    "-V" = {
      value = "$esxi_hardware_vendor$"
      description = "Vendor code: auto, dell, hp, ibm, intel, or unknown"
    }
    "-I" = {
      value = "$esxi_hardware_html$"
      description = "generate html links for country XX"
    }
    "-i" = {
      value = "$esxi_hardware_ignore$"
      description = "comma-separated list of elements to ignore"
    }
    "-p" = {
      set_if = "$esxi_hardware_perfdata$"
      description = "collect performance data for pnp4nagios"
    }
    "--no-power" = {
      set_if = "$esxi_hardware_nopower$"
      description = "don't collect power performance data"
    }
    "--no-volts" = {
      set_if = "$esxi_hardware_novolts$"
      description = "don't collect voltage performance data"
    }
    "--no-current" = {
      set_if = "$esxi_hardware_nocurrent$"
      description = "don't collect current performance data"
    }
    "--no-temp" = {
      set_if = "$esxi_hardware_notemp$"
      description = "don't collect temperature performance data"
    }
    "--no-fan" = {
      set_if = "$esxi_hardware_nofan$"
      description = "don't collect fan performance data"
    }
  }

  vars.esxi_hardware_host = "$address$"
  vars.esxi_hardware_port = 5989
  vars.esxi_hardware_perfdata = false
  vars.esxi_hardware_nopower = false
  vars.esxi_hardware_novolts = false
  vars.esxi_hardware_nocurrent = false
  vars.esxi_hardware_notemp = false
  vars.esxi_hardware_nofan = false
}

Service definition

Service definition in Nagios, Icinga 1.x, Shinken, Naemon

Basic check on a HP server:

# Check HP Server hardware
define service{
use generic-service
host_name esxi1
service_description Hardware
check_command check_esxi_hardware!root!mypass!hp
}

Service check on a DELL Server with perfdata:

# Check DELL Server hardware
define service{
use generic-service
host_name esxi2
service_description Hardware
check_command check_esxi_hardware!root!mypass!dell!--perfdata
}

>Service check on a HP Server with user and password read from file:

# Check HP Server hardware
define service{
use generic-service
host_name esxi1
service_description Hardware
check_command check_esxi_hardware!/home/nagios/.esxipass!/home/nagios/.esxipass!hp
}

Service check on an IBM Server where System Event Log alerts should be ignored:

# Check IBM Server hardware
define service{
use generic-service
host_name esxi3
service_description Hardware
check_command check_esxi_hardware!root!mypass!ibm!-i "IPMI SEL"
}

Service object definition Icinga 2.x

The following example is a standard service object checking a Dell server with perfdata enabled and using a file for the password:

# Hardware Check
object Service "Hardware" {
  import "generic-service"
  host_name "myesxiserver1"
  check_command = "esxi_hardware"
  vars.esxi_pass = "file:/var/lib/nagios/.esxipass"
  vars.esxi_vendor = "dell"
}

Apply rule in Icinga 2.x

The next example is more interesting as it uses apply rules (a feature of Icinga 2.x). Let's assume we have a couple of ESXi hosts already defined with custom attributes:

object Host "myesxiserver1" {
  import "generic-host"
  address = "10.50.8.71"
  vars.os = "ESXi"
  vars.server.vendor = "Cisco"
}

Using these custom attributes (vars.os and vars.server.vendor) we can now create an apply rule:

# Hardware Checks of Cisco UCS ESX servers
apply Service "Hardware" {
  import "generic-service"

  check_command = "esxi_hardware"
  vars.esxi_hardware_user = "root"
  vars.esxi_hardware_pass = "file:/var/lib/nagios/.esxipass"
  vars.esxi_hardware_vendor = "auto"
  vars.esxi_hardware_perfdata = true

  assign where host.address && host.vars.os == "ESXi" && host.vars.server.vendor == "Cisco"
}

This "Hardware" service object will now be applied to all hosts which have the custom host attributes host.vars.os set to "ESXi" and host.vars.server.vendor set to "Cisco".

Screenshots

check_esxi_hardware.py disk array critical
check_esxi_hardware.py System Event Log critical
check_esxi_hardware.py Raid controller battery warning
check_esxi_hardware.py hardware ok