Nagios/Monitoring Plugin check_esxi_hardware FAQ

Written by Claudio Kuenzler - 157 comments

Published on December 5th 2012 - last updated on November 18th 2024 - Listed in VMware Nagios Icinga Virtualization Monitoring Hardware

Since the initial release of the Nagios/Icinga/Monitoring plugin check_esxi_hardware back in August 2008 by David Ligeret (at that time under the name check_esx_wbem) the script has been downloaded several thousand times and many people have since then worked to improve the script.

As there are often questions about the plugin, most of them hardware related questions, I think it's useful to have a Frequently Asked Questions overview. The question you're burning to ask was probably asked at least once. But note: This FAQ does not replace the documentation. You still need to read that to understand and correctly use the plugin. ;-)

Newest FAQ entries are found at the bottom.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin is executed fine but then it hangs on a certain CIM element on a DELL server. Verbose output:

20111104 18:21:56 Connection to https://esxi-001
20111104 18:21:56 Check classe OMC_SMASHFirmwareIdentity
20111104 18:21:57 Element Name = System BIOS
20111104 18:21:57 VersionString = 1.3.6
20111104 18:21:57 Check classe CIM_Chassis
20111104 18:21:57 Element Name = Chassis
20111104 18:21:57 Manufacturer = Dell Inc.
20111104 18:21:57 SerialNumber = xxxxx
20111104 18:21:57 Model = PowerEdge R710
20111104 18:21:57 Element Op Status = 0
20111104 18:21:57 Check classe CIM_Card
20111104 18:21:58 Element Name = unknown
20111104 18:21:58 Element Op Status = 0
20111104 18:21:58 Check classe CIM_ComputerSystem
CRITICAL: Execution time too long!

A: According to user feedback and lots of tests, such problems are related to the Dell OMSA Offline Bundle. Especially OMSA version 6.5 made problems on ESXi 5.x servers.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I get the following error message from the plugin:

(0, 'Socket error: [Errno 111] Connection refused')

A: Make sure the Monitoring Server is able to access tcp port 5989 (cim) on the ESX(i) server. Alternatively you can also set a different port with the -C parameter if you have a special DNAT or port forwarding in place.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: How do I use the -i parameter to ignore certain alarms?

A: As written in the documentation, the -i parameter awaits a comma separated list of elements to ignore. The "tricky" part is to find the correct element names (they can be pretty long sometimes). Run the plugin in verbose mode to have a list of all CIM elements. Here's an example how to ignore several elements:

./check_esxi_hardware.py -H myesxi -U root -P mypass -V dell -i "IPMI SEL","Power Supply 2 Status 0: Failure status","System Board 1 Riser Config Err 0: Config Error"

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I have the following warning showing up but my server shows all sensors green:

WARNING : System Board 1 Riser Config Err 0: Connected - Server: Dell Inc. PowerEdge R620 s/n: xxxxxxx System BIOS: 1.1.2 2012-03-08

A: It seems that all Dell PowerEdge x620 servers are affected, it looks like a BMC firmware bug to me. A workaround for this bugger is in place since version 20121027. Please check this post for detailed information.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin returns the following outpout:

Authentication Error! - Server:

A: There are several answers to that:
1. Make sure you are either using the ESXi root user or that you create a user which is member of the root group. See this workaround for a short description how to do that.
2. The password you are using has some special characters like a question mark and you need to quote them.
3. The password you are using has a Dollar sign ($) which you need to single-quote.

Generally, always put quotes around your password as this assures the content is handled as string.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Can the plugin also monitor other stuff like VMFS disk usage or cpu/memory usage?

A: No. The plugin makes use of the CIM (Common Infrastructure Model) API. The so-called CIM elements cover hardware only.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Some hardware is not being monitored by the plugin.

A: The plugin can only monitor the hardware which is "shown" by the server via the CIM API. If the hardware vendor does not include a certain hardware element into the CIM elements, then this piece of hardware can not be monitored. In all the years I've only seen this on no-name machines (and SUN) though.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin is so slow that a timeout occurs.

A: In such cases always verify how the behavior is on your vSphere client in the Hardware tab. Click on the "Update" link and then "Refresh". Are they fast or do they also take a long time to update?
In ESXi 5.0 Update 1, a bug was causing slow hardware discovery/checks. See this article for more information. This was quite an annoying bug and I got bombed with e-mails... -_-

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin suddenly times out, but it was working fine before. The plugin returns the following output:

UNKNOWN: (0, 'Socket error: [Errno 110] Connection timed out')

A: In rare cases it is possible, that the sfcbd-watchdog service, running on the ESXi server, isn't working correctly anymore. Follow VMware KB entry 1013080 and restart the service by logging into the ESXi server by ssh and launch the following command:

/etc/init.d/sfcbd-watchdog restart

If this still doesn't resolve your issue, a manual restart of the "CIM Server" could help. This option is found under the "Configuration" tab -> "Security Profile". Click on "Service ... Properties".

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: After an update of the pywbem package the plugin doesnt work anymore. The following output is shown in verbose mode:

Unknown CIM Error: (0, 'SSL error: certificate verify failed')

A: This was seen in SLES 11 SP3 after an update of the package python-pywbem from 0.7-6.13 to 0.7-6.22. After reverting to the older version, the plugin worked again.

Update September 9th 2014: This error will be fixed in a future release of check_esxi_hardware.py, but it depends on the release of the new pywbem upstream version.
See https://github.com/Napsty/check_esxi_hardware/issues/7.

Update June 26th 2015: This issue was fixed in version 20150626.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: On an IBM server with the ESXi image from IBM the following error appears but works fine with the regular image vom VMware:

Traceback (most recent call last):
File "./check_esxi_hardware.py", line 625, in
verboseoutput(" Element Name = "+elementName)
TypeError: cannot concatenate 'str' and 'NoneType' objects

A: The CIM definition coming from the IBM image seems to be lacking some information. Version 20150119 fixes this issue.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I updated my Ubuntu 14.04 and pywbem package 0.7.0-4ubuntu1~14.04.1 was installed. Since then I get the following error when the plugin is run:

Traceback (most recent call last):
File "/usr/local/bin/check_esxi_hardware.py", line 619, in
    instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 421, in EnumerateInstances
    **params)
File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 183, in imethodcall
    no_verification = self.no_verification)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 268, in wbem_request
    h.endheaders()
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 115, in send
    self.connect()
File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 167, in connect
    except ( Err.SSLError, SSL.SSLError, SSL.SSLTimeoutError
AttributeError: 'module' object has no attribute 'SSLTimeoutError'

A: It seems that Ubuntu did the same as SUSE, RedHat and Centos in the past: The pywbem was patched without changing the upstream version number. This goes into the same direction as issue #7 (https://github.com/Napsty/check_esxi_hardware/issues/7). A temporary fix is to manually install the older pywbem package like this:

aptitude install python-pywbem=0.7.0-4

Update June 26th 2015: This issue was fixed in version 20150626.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I use python3 but the plugin throws an error:

File "./check_esxi_hardware.py3", line 440
print "%s %s" % (time.strftime("%Y%m%d %H:%M:%S"), message)
^
SyntaxError: invalid syntax

A: An issue was opened on github (https://github.com/Napsty/check_esxi_hardware/issues/13) to address this compatibility issue.

Update: This issue was fixed in version 20181001.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I sometimes get the following error on an ESXi host:

CRITICAL: (0, 'Socket error: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol')

A: After a lot of debugging and testing with a plugin user we came to the conclusion, that this problem arises from the ESXi host, not the plugin.
A tcpdump revealed, that the ESXi host sent a TCP Reset packet rather then starting to submit data. A reboot of the affected ESXi host resolved the problem.

Update October 17th, 2019: Such situations can (sometimes) also be confirmed in the vSphere Client UI using the Monitor -> Hardware Health window. A click on the "REFRESH" button results in an error in the recent tasks list:

A general system error occurred: Server closed connection after 0 response bytes read;

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I have several ESXi hosts behind the same IP (NAT). How can I use the check_esxi_hardware?

A: Since version 20160531 it is possible to manually define the CIM port (which defaults to 5989). So if you set up port forwarding (DNAT) you can now monitor all ESXi servers behind the same NAT-address. The parameter you want in this case is "-C" (or --cimport).

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Is the plugin compatible with ESXi 6.x?

A: Yes. Please note that starting with ESXi 6.5 you might have to enable the CIM/WBEM services first, as they are disabled by default. Refer to https://kb.vmware.com/s/article/2148910.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I can't execute the plugin and get the following error message. Permissions are correct however (e.g. 755).

execvpe(/usr/lib64/nagios/plugins/check_esxi_hardware.py) failed: Permission denied

A: This error comes from SELinux. You need to write an allow rule for it.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin reports the following problem with memory, but no memory hardware issues can be found on the server:

CRITICAL : Memory - Server: HP ProLiant DL380p Gen8 s/n....

A: It is possible that an alert needs to be cleared in the servers IPMI log first. To do that, you need to login into your ESXi server with SSH and run the following commands:

esxiserver ~ # localcli hardware ipmi sel clear
esxiserver ~ # /sbin/services.sh restart

This might affect other CIM entries as well. So it's a wise idea to clear the IPMI system event log (sel) first before investigating further.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Certain hardware elements show incorrect health/operational states, e.g. "Cooling Unit 1 Fans":

20190205 00:26:26 Element Name = Cooling Unit 1 Fans
20190205 00:26:26 Element HealthState = 10
20190205 00:26:26 Global exit set to WARNING

A: Certain server models might show false hardware alarms when these particular hardware elements were disabled in BIOS, are idle or have disabled sensors. From the HP FAQ:

PR 2157501: You might see false hardware health alarms due to disabled or idle Intelligent Platform Management Interface (IPMI) sensors. Disabled IPMI sensors, or sensors that do not report any data, might generate false hardware health alarms.

In this case it makes sense to ignore these elements using the -i parameter.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The check_esxi_hardware plugin is not working (anymore) since ESXi 6.7 U2/U3 on DELL servers.

A: The issue seems to be the "OpenMange" VIB. This can be verified by checking the list of installed VIB's on an ESXi server:

esxiserver ~ # esxcli software vib list
Name Version Vendor
[...]
OpenManage 9.3.0.ESXi670-3465 Dell
[...]

After uninstalling the OpenManage VIB, the plugin works again. According to DELL, ESXi 6.7 U2 is not yet officially supported (as of July 2019) by OpenManage:

OpenManage Integration for VMware vCenter v4.3.1 (Initial 4.3 Download) (4.3.1 Release Notes) (4.3 Manuals)
Does not add official 6.7 U2 support (support for 6.7 U2 will come in the fall with the next major release)

See also official VMware KB 74696 entry for this.

Update October 15th 2019: OMSA 9.3.1 fixes this issue.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I am using Icinga 2 and getting the following error message in the check output:

A: This timeout comes from Icinga 2 itself and means that the plugin's process was killed during its runtime. You should increase the timeout of the Service object or of the CheckCommand object. The default is 1 minute, some servers with a lot of CIM sensors might need longer to respond.

Update November 5th 2019: It is also possible that the CIM queries take a very long time due to a full System Event Log (SEL) - this causes the timeout exceeded. This can be verified by checking such log entries in the ESXi's syslog:

2019-11-05T11:41:16Z sfcb-vmw_ipmi[2125408]: tool_mm_realloc_or_die: memory re-allocation failed(orig=1789600 new=1790000 msg=Cannot allocate memory, aborting

In this case the SEL needs to be cleared on the ESXi server with the following command:

esxiserver ~ # localcli hardware ipmi sel clear

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Is the plugin compatible with the ESXi version XY?

A: Please check out the check_esxi_hardware compatibility matrix to see which combinations are successfully tested. Hint: Most combos work, there are very few compatibility issues reported.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin's output contains Python dependency warnings:

/usr/lib/python3/dist-packages/requests/__init__.py:89: RequestsDependencyWarning: urllib3 (1.26.6) or chardet (3.0.4) doesn't match a supported version!
warnings.warn("urllib3 ({}) or chardet ({}) doesn't match a supported "

A: This actually has nothing to do with the plugin itself. Check out this blog post to see how to solve this (Python is missing a module).

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin shows CRITICAL on non-existing disks on my HP server:

CRITICAL : Disk 1 on HPSA1 : Port 1I Box 3 Bay 4 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 4 on HPSA1 : Port Box 0 Bay 41 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 5 on HPSA1 : Port Box 0 Bay 43 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 6 on HPSA1 : Port Box 0 Bay 46 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 7 on HPSA1 : Port Box 0 Bay 49 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 8 on HPSA1 : Port Box 0 Bay 79 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 9 on HPSA1 : Port Box 0 Bay 105 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 10 on HPSA1 : Port Box 0 Bay 107 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 11 on HPSA1 : Port Box 0 Bay 108 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 12 on HPSA1 : Port Box 0 Bay 113 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 13 on HPSA1 : Port Box 0 Bay 185 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 14 on HPSA1 : Port Box 0 Bay 221 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 15 on HPSA1 : Port Box 0 Bay 225 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 16 on HPSA1 : Port Box 0 Bay 226 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 17 on HPSA1 : Port Box 0 Bay 227 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 18 on HPSA1 : Port Box 0 Bay 233 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 19 on HPSA1 : Port Box 0 Bay 249 : 0GB : Unconfigured Disk : Disk Error - Server: HPE ProLiant DL380 Gen10 s/n: SN System BIOS: U30 2021-09-03

A: This is a bug in HP software, more precisely in the "smx-providers" VIB. See HP Advisory a00117054 for more information.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin does not work with a non-root user. Can I use a different user than root?

A: This used to work in earlier versions of ESXi with a user created in the user interface. According to user feedback this workaround stopped working with ESXi 6.5. However there is a newer workaround possible, by creating an additional local system user on the ESXi server using the esxcli command. Note that group membership of the "root" group is required (unfortunately) to be able to query the CIM server.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: After upgrading from vSphere 6.x to 7.x the plugin stopped working and shows an authentication error:

python3 /usr/lib64/nagios/plugins/check_esxi_hardware.py --vendor dell -H esxserver -U file:/.auth/nagios_user -P file:/.auth/nagios_user --verbose
20230724 15:26:04 LCD Status: True
20230724 15:26:04 Chassis Intrusion Status: True
20230724 15:26:04 Connection to https://esxserver
20230724 15:26:04 Found pywbem version 1.6.1
20230724 15:26:04 Check classe OMC_SMASHFirmwareIdentity
20230724 15:26:05 Global exit set to UNKNOWN
UNKNOWN: Authentication Error

A: By upgrading the vSphere/ESXi version, manually added users (such as the "Nagios" user used above) are removed. You will have to manually re-create the additional local system users on the ESXi server(s).

--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: Will check_esxi_hardware be compatible with vSphere 9.x?

A: Not sure. The vSphere (ESXi) 8.0 release notes mention the following:

Deprecation of Common Information Model (CIM) and Service Location Protocol (SLP): Support for CIM and SLP is deprecated in ESXi 8.0 due to security issues and will be removed in a future release. As an alternative, consider using the Daemon Software Development Kit (DSDK) for solutions that rely on CIM, such as the CIM Provider Development Kit (CIMPDK) and the vSphere APIs for I/O Filtering (VAIO) Development Kit. No CIMPDK is released for vSphere 8.0, but CIM Providers for ESXi 7.x. continue to work on ESXi 8.0 to support a smooth upgrade process.

This means that CIM support will be removed in a future release, which most likely will be vSphere/ESXi 9.x. What exactly this means and if the plugin will stop working is not clear to me yet. Maybe the plugin also needs a rewrite for future ESXi versions. We'll see.

Update November 2024: The removal of CIM services in the next major release (ESXi 9.x) was confirmed by Broadcom.

Add a comment

Show form to leave a comment

Comments (newest first)

sag from USA wrote on Oct 18th, 2022:

Traceback (most recent call last):
line 805, in
except pywbem._cim_operations.CIMError as args:
AttributeError: 'module' object has no attribute '_cim_operations'

Bab from AUS wrote on Dec 20th, 2021:

I downloaded latest check_esxi_hardware.py and my os info is :
Red Hat Enterprise Linux Server release 6.4 (Santiago)
python-pywbem-0.7.1-5.1.x86_64

also I have create a user with read-only role on my esxi now when run this command on nagios show follow error :

[root@nagios ~]# /usr/local/nagios/libexec/check_esxi_hardware.py -H host1.opr.dsa -U mon -P Mon@123
Traceback (most recent call last):
File "/usr/local/nagios/libexec/check_esxi_hardware.py", line 721, in
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), no_verification=True)
TypeError: __init__() got an unexpected keyword argument 'no_verification'

But when run with root user it shows output without problem :

[root@nagios ~]# /usr/local/nagios/libexec/check_esxi_hardware.py -H host1.opr.dsa -U root -P password -V 'hp'
OK - Server: HPE ProLiant DL580 Gen10 s/n: CN78390000 System BIOS: U34 2021-05-24

ck from Switzerland wrote on Oct 20th, 2021:

Hello Vijay. Thank you for sharing this information!

Vijay from wrote on Oct 20th, 2021:

Hello CK,

ESXi 7.0 Update 3 | 05 OCT 2021 | ISO Build 18644231 fixes this. https://docs.vmware.com/en/VMware-vSphere/7.0/rn/vsphere-esxi-703-release-notes.html#resolvedissues

The sensord daemon fails to report ESXi host hardware status
A logic error in the IPMI SDR validation might cause sensord to fail to identify a source for power supply information. As a result, when you run the command vsish -e get /power/hostStats, you might not see any output.

Workaround: None

Thanks for your suggesstion.

ck from Switzerland wrote on Oct 17th, 2021:

Hi Vijay. Yes, that is what I meant. Meanwhile there is also ESXi 7.0 U3 available. Maybe this contains a fix to the problem you are seeing? A bug in the HPE Offline Bundle cannot be ruled out either. Happened already in the past (to both DELL and HP).

Vijay from wrote on Oct 15th, 2021:

Hello CK,

Thanks for your reply. Are you talking about this ? We already have HP addons included after the upgrade.

smx-provider 700.03.16.00.12-14828939 HPE VMwareAccepted 2021-03-29

This the latest version on esxi 7.0u2 as well. I dont see any other issue. Our check has been working for years and its not now.

ck from Switzerland wrote on Oct 15th, 2021:

HI Vijay. My guess is that you did not update the HP CIM Offline Bundle installation. It is a separate package and once you upgraded ESXi, maybe there are version conflicts. Make sure to update the HP CIM Offline Bundle to the right version as well. Or use the HPE customized ESXi ISO as installation source. See HPE Custom ESXi images and downloads.

Vijay from wrote on Oct 14th, 2021:

Hello ,
The WBEM providers are enabled and CIM service is running. We 're running from a linux servers which used to work for years is now not working. Only change is esxi is upgraded to 7.0u2

What we noticed is HealthState is reported as 30, which is a “non-recoverable error”, while iLO shows them fine.

Any help would be appreciated. Thanks.

[root@testesxiserver01 ~]# ./python3 /root/check_esxi_hardware.py -H testesxiserver01 -U $USERNAME -P $PASSWORD -V hp
CRITICAL : Power Supply 1 CRITICAL : Power Supply 2 - Server: HP ProLiant DL360 Gen9 s/n: XXXXXXXX System BIOS: P89 2020-10-16

192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml- Power Supply 1
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml:
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml- 30
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_0.xml-
--
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml- Power Supply 2
1192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml:
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml- 30
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml-
192.168.0.10 /root#cimv2/OMC_PowerSupply/instance_1.xml-

ck from Switzerland wrote on Jul 19th, 2021:

Hi Xavier. You could simply add these classes into the plugin and then run verbose mode to see what attributes are found. Or you could use the VMware CIM Browser to manually browse through the CIM elements.

Xavier Van Dessel from Belgium wrote on Jul 18th, 2021:

Thanks Claudio. I ran fetched the CIM_ManagedElement list and I obtained some classes that may be interesting to monitor. On the VMWare site I found the classes and their attributes but these only deal with CIM, OMC and VMWare classes. This server has classes like IODM_FCAdapter or IODM_SCSILocalDevice. Any idea how I can obtain more detailed info about these classes and what attributes I could examine for these?

ck from Switzerland wrote on Jul 14th, 2021:

Xavier, you could try it with "CIM_ManagedElement" as top-level (tree) element. And see what is reported afterwards.

Xavier Van Dessel from Belgium wrote on Jul 12th, 2021:

I'm trying to use the plugin to monitor an ESXi that runs on NEC/Stratus FT (Fault Tolerant) server.
The Classes for BIOS, Chassis, CIM_card, CIM_computersystem, CPU and cache memory seem to work.
However, there are no "Element" entries under OMC_fan, OMC_PowerSupply or storage related classes.

Is there a way to dump the complete CIM tree so that I can know the Classes to check?
If there is, I will gladly provide you with the details for this type of server.

Xavier

ck from Switzerland wrote on May 26th, 2021:

Vacheslav, you are right, the Icinga 2 ITL does not contain the new -S parameter. I will have to make a PR for this. But you can overwrite the existing command definition or add your own (simply call it "check_esxi_hardware" instead), based on the example Icinga 2 command definition in the documentation. Once you have done that, you can for example use:

vars.esxi_hardware_sslproto = "TLSv1.3"

Vacheslav from wrote on May 26th, 2021:

well actually Armstrong was in some Hollywood studio..however, in th isinga2 service definition there is an error:
# Hardware Check
object Service "Hardware" {
import "generic-service"
host_name "myesxiserver1"
check_command = "esxi_hardware"
vars.esxi_pass = "file:/var/lib/nagios/.esxipass"
vars.esxi_vendor = "dell"
}
missing = in host_name "myesxiserver1"
i.e
host_name = "myesxiserver1"

my problem is how to define the service or command with -S in icinga2?

Claudio Kuenzler from Switzerland wrote on May 15th, 2021:

Peter, compare the output of the plugin with the information in vSphere UI hardware of thst server. Is there an error too? If yes, then it looks like a bug in the HP CIM Offline Bundle. Maybe you need to update this one too?

Peter Roddan from Dartford, UK wrote on May 13th, 2021:

Hi CK,

Thanks very much for this great plugin.
I have been using it on my farm of HP servers, and I have noticed an issue with the latest controller firmware.
I have a number of DL360 and DL380 Gen 10s, all which have HPE Smart Array P408i-a SR Gen10 controllers.
If I upgrade the controller firmware to the latest version, 3.53 May 2021, I get the "Unconfigured disk" error noted below, when there are no disks plugged in on those ports - in the example below, the server only has 4 disks in it, the error starts at disk 5. If I downgrade the firmware back to 3.00 the plugin then works fine.

I'm not sure if this is a firmware bug or not - have you seen this problem?

Many Thanks.

Peter

CRITICAL : Disk 5 on HPSA1 : Port 2I Box 0 Bay 6 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 6 on HPSA1 : Port Box 0 Bay 18 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 7 on HPSA1 : Port Box 0 Bay 20 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 8 on HPSA1 : Port Box 0 Bay 22 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 9 on HPSA1 : Port Box 0 Bay 26 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 10 on HPSA1 : Port Box 0 Bay 28 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 11 on HPSA1 : Port Box 0 Bay 30 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 12 on HPSA1 : Port Box 0 Bay 34 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 13 on HPSA1 : Port Box 0 Bay 36 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 14 on HPSA1 : Port Box 0 Bay 38 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 15 on HPSA1 : Port Box 0 Bay 42 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 16 on HPSA1 : Port Box 0 Bay 44 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 17 on HPSA1 : Port Box 0 Bay 46 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 18 on HPSA1 : Port Box 0 Bay 52 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 19 on HPSA1 : Port Box 0 Bay 53 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 20 on HPSA1 : Port Box 0 Bay 86 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 21 on HPSA1 : Port Box 0 Bay 116 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 22 on HPSA1 : Port Box 0 Bay 120 : 0GB : Unconfigured Disk : Disk Error CRITICAL : Disk 23 on HPSA1 : Port Box 0 Bay 121 : 0GB : Unconfigured Disk : Disk Error - Server: HPE ProLiant DL360 Gen10 s/n: xxxxxxxxxx System BIOS: U32 2021-01-23

ck from Switzerland wrote on Apr 29th, 2021:

Jespo and Guillaume, please create an issue in the code repository on Github. Thx.

Guillaume from wrote on Apr 26th, 2021:

Hi

by runnning the command i have the following error that i cannot find on the other topics :(
File "./check_esxi_hardware.py", line 668
with open(sslconfpath, 'w') as config_file:
^
SyntaxError: invalid syntax

thansk!

Jospo from china wrote on Apr 20th, 2021:

hello:
According to the configuration of the tutorial, I have encountered such an error now. I don't know how to solve it. I searched for the problem on the Internet, but I didn't see a solution to the same problem as mine. Can you help me solve it? thank you very much!
"UNKNOWN: (0, 'Socket error: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)')"

ck from Switzerland wrote on Apr 14th, 2021:

Hi Bapt. The question should probably be asked in the Centreon community. The message "No output returned from plugin" seems to come from Centreon. Maybe the enforced critical state sends wrong parameters to the plugin, not sure here.

Bapt from France wrote on Apr 13th, 2021:

Not exactly.
In Centreon GUI, when I force the check to the status "Critical" by submitting manually a status and it becomes effectively "Critical", I get a "(No output returned from plugin)" in the Informations column of the view (which corresponds to $SERVICEOUTPUT$ item I see in the notification mail command and the test mail sent, and to the output I see in Centreon CLI and logs).
The goal is to have the Centreon "usual output" that I get from the modified .py script and which is displayed when Centreon state is "OK", by example "20210413 17:03:48 Other 1 01-Inlet Ambient = xx.000000" (where xx is <= 26°C when Centreon state is "OK", and would be >= 28°C when Centreon state becomes "Critical") being displayed as well in Informations column in the Centreon GUI view of resources, in the notification mail as $SERVICEOUTPUT$, and in Centreon CLI/logs, when the state is "critical", instead of this "(No output returned from plugin)".

BR,
Baptiste

Claudio Kuenzler from Switzerland wrote on Apr 13th, 2021:

Hello Bapt. Not sure if I understand your comment correctly but are you saying the the plugin does not show any performance data when NOT being in an OK state? If yes, then please open an issue on the GitHub repo.

Bapt from France wrote on Apr 13th, 2021:

Hello,

I modified the script in order to display only one specific temperature on Centreon, which works well as long as the state keeps being OK (i.e. until 26°C, then between 26°C and 28° state becomes Warning, then 28°C and above is the state Critical, that triggers a notification mail to be sent).
However, no way to have anything else than "No output returned from plugin" displayed (in Centreon GUI, in Centreon CLI and on the logs) when I simulate/force the status Critical on Centreon GUI (the goal is to keep having the temperature displayed in Centreon GUI and in the notification mail to be sent, in a kind of $SERVICEOUTPUT$ value, which is in fact an item of the notification mail command), despite many test combanations on command writing (with and without $_SERVICEWARNING$ $_SERVICECRITICAL or $SERVICEWARNING$ $SERVICECRITICAL$ or even $WARNING$ $CRITICAL$, same when writing the .cfg)/service editing (with and without $_SERVICEWARNING$ $_SERVICECRITICAL or $SERVICEWARNING$ $SERVICECRITICAL$ or even $WARNING$ $CRITICAL$ too)/user management (root permissions, as well as centreon-engine ones on /usr/lib64/nagios/plugins where the .py script is) items, and so on.
I even tried to hack a little bit the script by some deviation of its "Critical" HealthState and elementStatus (it is a HP check), even if I know this won't match nor being able of Centreon state forcing.
So, is there a way to have that temperature displayed even when the check is Critical on Centreon, please?

BR,
Baptiste

Claudio Kuenzler from Switzerland wrote on Mar 26th, 2021:

Ian, you might want to take a look at this link concerning the package to install.
Also to rule out an authentication problem, try the plugin with the root user.

Ian from United Kingdom wrote on Mar 26th, 2021:

The machine was built using the HPE ESXi image and it looks like I have the smx-provider 650.03.16.0.0.4-4240417 package installed. Does that seem right or is there another package I need for this DL360 Gen9? From the logs it looks like the account is being locked, but I'm certain of the username and password and have added it to the root group.

ck from Switzerland wrote on Mar 26th, 2021:

Hi Ian. The plugin works fine with newer ESXi versions. I run the plugin currently on dozens of ESXi 6.7 servers. This error seems to come from the HP CIM Bundle. Make sure you have the correct HP CIM Bundle for ESXi installed. There are multiple versions and it is sometimes tricky to find the right one.

Ian from United Kingdom wrote on Mar 25th, 2021:

I'm getting an error with version 20200710 querying ESXi version 6.5 on HP Proliant hardware from an Ubuntu host:

20210325 11:40:11 LCD Status: True
20210325 11:40:11 Chassis Intrusion Status: True
20210325 11:40:11 Connection to https://hostnameredacted
20210325 11:40:11 Found pywbem version 0.8.0~dev
20210325 11:40:11 Check classe OMC_SMASHFirmwareIdentity
Traceback (most recent call last):
File "/usr/lib/nagios/plugins/check_esxi_hardware.py", line 785, in
except pywbem._exceptions.ConnectionError as args:
AttributeError: 'module' object has no attribute '_exceptions'

it works fine against our ESXi 6.0 hosts, but obviously we're looking to upgrade soon

Phil from Hereford, UK wrote on Jun 7th, 2019:

I had to uninstall OpenManage from my ESXi 6.7U2 hosts otherwise CIM queries timed out.

bridrod from United States wrote on May 17th, 2019:

This started happening after applying Update 2 to v6.7 (v6.7U2). Working fine on v5.0, v6.0, v6.5 or even v6.7U1:
Traceback (most recent call last):
File "/tmp/check_esxi_hardware.py", line 717, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib64/python2.7/site-packages/pywbem/cim_operations.py", line 1585, in EnumerateInstances
**extra)
File "/usr/lib64/python2.7/site-packages/pywbem/cim_operations.py", line 914, in _imethodcall
recorder=self.operation_recorder)
File "/usr/lib64/python2.7/site-packages/pywbem/cim_http.py", line 756, in wbem_request
client.endheaders()
File "/usr/lib64/python2.7/httplib.py", line 1038, in endheaders
self._send_output(message_body)
File "/usr/lib64/python2.7/httplib.py", line 882, in _send_output
self.send(msg)
File "/usr/lib64/python2.7/site-packages/pywbem/cim_http.py", line 446, in send
self.connect()
File "/usr/lib64/python2.7/site-packages/pywbem/cim_http.py", line 550, in connect
ret = self.sock.connect_ssl()
File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 295, in connect_ssl
return m2.ssl_connect(self.ssl, self._timeout)
M2Crypto.SSL.SSLError: (104, 'Connection reset by peer')

ck from Switzerland wrote on Mar 6th, 2019:

Hello Yannick. There are several possible reasons what could cause this.

1) Make sure you are using the correct and newest CIM Bundle for IBM Servers. I was not able to find a download for newer ESXi versions on the IBM site, only for older 4.x and 5.x versions. You should contact your IBM support.

2) Another reason (but I doubt it) could be a problem in the CIM server itself. Look at the existing FAQ and try restart all the relevant services (in both vsphere UI and on cli). Also try a reboot just to rule that out.

Yannick from wrote on Mar 6th, 2019:

Hello,

I've just tried your script but face an issue with timeouts / slowness.
Do you have an idea of what's going on?

Note that
* I've tried with the local ESXi root user to avoid permission issues.
* I've killed the command due to the very long duration.
* ESXi version 6.0.0, 5572656

[root@monitorsrv1 plugins]# ./check_esxi_hardware.py -H -U root -P -V ibm -I LU -p -v -t 3600
20190306 10:29:31 LCD Status: True
20190306 10:29:31 Connection to https://
20190306 10:29:31 Found pywbem version 0.7.0
20190306 10:29:31 Connection error, disable SSL certification verification (probably patched pywbem)
20190306 10:29:31 Check classe OMC_SMASHFirmwareIdentity
20190306 10:29:31 Element Name = System BIOS
20190306 10:29:31 VersionString = -[G0E183BUS-1.83]-
20190306 10:29:31 Check classe CIM_Chassis
20190306 10:29:32 Element Name = Chassis
20190306 10:29:32 Manufacturer = IBM
20190306 10:29:32 SerialNumber =
20190306 10:29:32 Model = System x3850 X5 -[7143YSB]-
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Check classe CIM_Card
20190306 10:29:32 Element Name = Processor/Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = I/O Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Daughter Board
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Element Name = Memory Module
20190306 10:29:32 Element Op Status = 0
20190306 10:29:32 Check classe CIM_ComputerSystem
20190306 10:39:33 Unknown CIM Error: (1, u'Timeout (or other socket error) waiting for response from provider')
20190306 10:39:33 Check classe CIM_NumericSensor
20190306 10:49:33 Unknown CIM Error: (1, u'Timeout (or other socket error) waiting for response from provider')
20190306 10:49:33 Check classe CIM_Memory
20190306 11:07:58 Unknown CIM Error: (1, u'Timeout (or other socket error) waiting for response from provider')
20190306 11:07:58 Check classe CIM_Processor
^CTraceback (most recent call last):
File "./check_esxi_hardware.py", line 708, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib/python2.7/site-packages/pywbem/cim_operations.py", line 420, in EnumerateInstances
**params)
File "/usr/lib/python2.7/site-packages/pywbem/cim_operations.py", line 182, in imethodcall
no_verification = self.no_verification)
File "/usr/lib/python2.7/site-packages/pywbem/cim_http.py", line 298, in wbem_request
response = h.getresponse()
File "/usr/lib64/python2.7/httplib.py", line 1113, in getresponse
response.begin()
File "/usr/lib64/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib64/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib64/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 228, in read
return self._read_bio(size)
File "/usr/lib64/python2.7/site-packages/M2Crypto/SSL/Connection.py", line 213, in _read_bio
return m2.ssl_read(self.ssl, size, self._timeout)
KeyboardInterrupt
[root@monitorsrv1 plugins]#

BR,
Yannick

ck from Switzerland wrote on Feb 16th, 2019:

Sebbo, by default the variable "esxi_hardware_perfdata" is set to false. See https://icinga.com/docs/icinga2/latest/doc/10-icinga-template-library/#esxi_hardware . This means you have to either set the variable to "true" manually in the command definition or you overwrite the command definitions default in the service object.

Sebbo from Germany wrote on Feb 15th, 2019:

Hi Claudio, thank you for answering. Actually i recognized that the plugin is working properly right after i posted here. :D
So i have to check why no perfdata is collected. If i read the icinga2 esxi_hardware command right, it should ootb. But that seems to be a question for the icinga community. Thanx anyway. ;)

Claudio Kuenzler from Switzerland wrote on Feb 15th, 2019:

Hello Sebbo. No, the plugin works in general for all hardware vendors. As long as the hardware reports its status via CIM. As long as the hardware reports OK you will only see server and bios version as output. The plugin has a parameter for performance data, see the documentation.

Sebbo from Germany wrote on Feb 15th, 2019:

Hi, does the script only work for the vendors dell, hp, ibm and intel? I want to use it in icinga2 and got it running, but it only shows me server, serial and bios version information. Also it doesn't collect any perfdata in graphana.
When i run the script in shell mode with --verbose i get a lot of data.
Did i forget something to collect the data correctly with icinga?

ck from Switzerland wrote on Nov 19th, 2018:

Hello Marcus. According to VMware KB 2001549 (https://kb.vmware.com/s/article/2001549?lang=en_US), you have to check on the Broadcom website for newer CIM bundles for the RAID controller. I just checked on https://www.broadcom.com/support/download-search/?pg=Storage+Adapters,+Controllers,+and+ICs&pf=RAID+Controller+Cards&pn=&pa=&po=&dk= and only found "SMIS Providers" for ESXi 5.x, 6.0 and 6.5 but none for 6.7.

MarcusCaepio from Germany wrote on Nov 19th, 2018:

We are using LSI/Megaraid RAID Controller and in the past, after installing the CIMs on the ESXi server, we got the RAID status both in VMware and your plugin. Now, with ESXi 6.7 it seems, that the CIMs are not working anymore.
We tried CIMs of https://docs.broadcom.com/docs/VMW-ESX-6.5.0-lsiprovider-500.04.V0.71-0004-9942010.zip
and http://www.58support.nec.co.jp/global/download/042581-G02/VMW-ESX-6.0.0-lsiprovider-500.04.V0.69-0004-offline_bundle-8676001.zip according to the forum thread at https://communities.vmware.com/thread/587140?start=15&tstart=0.
Have you already stumbled upon this problem ?

ck from Switzerland wrote on Nov 12th, 2018:

Dave, make sure you install the CIM Offline Bundle from the HP website. For vSphere 6.0 this seems to be: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_6f4112854ceb47818294159196

Dave from Montana, USA wrote on Nov 12th, 2018:

Hello !
I'm running version 20181001 to check an HP G7 box with ESXi 6.0. It is working except not reporting any information on storage devices:
TBOX$ ./check_esxi_hardware.py -v -H 10.70.70.17 -U monitoresxi -P 'XXXXXXX' -V hp

20181111 17:10:24 LCD Status: True
20181111 17:10:24 Connection to https://10.70.70.17
20181111 17:10:24 Found pywbem version 0.8.0~dev
20181111 17:10:24 Check classe OMC_SMASHFirmwareIdentity
20181111 17:10:24 Element Name = System BIOS
20181111 17:10:24 VersionString = P68
20181111 17:10:24 Check classe CIM_Chassis
20181111 17:10:24 Element Name = Chassis
20181111 17:10:24 Manufacturer = HP
20181111 17:10:24 SerialNumber = XXXXXXXXXX
20181111 17:10:24 Model = ProLiant DL360 G7
.
.
.
20181111 17:10:28 Check classe OMC_PowerSupply
20181111 17:10:28 Element Name = Power Supply 1
20181111 17:10:28 Element HealthState = 5
20181111 17:10:28 Element Name = Power Supply 2
20181111 17:10:28 Element HealthState = 5
20181111 17:10:28 Element Name = Power Supply 4
20181111 17:10:28 Element HealthState = 5
20181111 17:10:28 Element Name = Power Supply 5
20181111 17:10:28 Element HealthState = 5
20181111 17:10:28 Check classe VMware_StorageExtent
20181111 17:10:28 Check classe VMware_Controller
20181111 17:10:29 Check classe VMware_StorageVolume
20181111 17:10:29 Check classe VMware_Battery
20181111 17:10:29 Check classe VMware_SASSATAPort
OK - Server: HP ProLiant DL360 G7 s/n: XXXXXXX System BIOS: P68 2011-05-05

Is there something missing in my ESXi installation or what am I doing wrong?
THANKS for this nice PY script.
-Dave

Sebastien from Canada wrote on May 23rd, 2018:

Hi, I have the same error as Daniel with a different HP ProLiant server. I'm using the latest version of the script. Any idea?

Pedro from Portugal wrote on Apr 20th, 2018:

Hi, in Icinga2 frontend i'm receiving this error:

UNKNOWN: (0, 'Socket error: [Errno 13] Permission denied')

From command line it works well.

Any idea why?

Claudio from Miami, USA wrote on Mar 13th, 2018:

Hi Daniel. This is interesting. Maybe the servers firmware still has a defective state somewhere. Did you already reboot the server including a full power off (remove power)? And at the boot let the memory check complete, don't skip it.

Daniel from Australia wrote on Mar 12th, 2018:

We had a memory RDIMM fail on our HP server, this monitor within Nagios correctly showed CRITICAL

When we replaced the RDIMM the error didnt go away:
CRITICAL : Memory - Server: HP ProLiant ML350p Gen8 s/n: XXXX System BIOS: P72 2015-07-01

When looking closer and running in Verbose mode, i can see the error is:
30 : ExitCritical, # Non-recoverable Error
when using vendor hp
or
7 : ExitCritical, # Non-recoverable Error
when using vendor auto

This isnt clear as to what the error is, HP iLO shows no issues, VMware Hardware shows no issues...

I have changed the code so that these two are ExitOK so that if Memory ends up Warning or Critical (with a different code) Nagios will report accordingly...

Not sure why this is happening, i cant see anything in the HP or VMware logs

If you have any ideas, please let me know

Claudio from Miami, USA wrote on Mar 6th, 2018:

Rico, the connection was not successful. See error message "Socket error: [Errno 111] Connection refused". Check your firewall logs and verify your monitoring server can access tcp 5989 of the ESXi server.

Rico from Germany wrote on Mar 6th, 2018:

Hello, I have a new System whit ESXi6.5 and I have to monitoring this system. After installing the plugin I execute the check, look at the output:
./check_esxi_hardware.py -H 10.120.80.84 -U root -P secret -v
20180306 13:08:31 LCD Status: True
20180306 13:08:31 Connection to https://10.120.80.84
20180306 13:08:31 Found pywbem version 0.8.0-dev
20180306 13:08:31 Check classe OMC_SMASHFirmwareIdentity
UNKNOWN: (0, 'Socket error: [Errno 111] Connection refused')
What goes wrong, the connection is established and than ?

Thanks for your coming answer

Regards Rico

ck from Switzerland wrote on Jul 27th, 2017:

Ricou, if possible use the latest version and update your pywbem. If for whatever reason you must use an old version of check_esxi_hardware, you have to give it up to 2 minutes to run the hardware checks. It depends on the server model you want to check but I have seen very long checks (up to 2 min) on HP DL380 Gen8 servers. So increase the timeout in the Nagios settings.
To find out if the plugin hangs somewhere, use the "-v" switch for verbose mode.

Ricou from wrote on Jul 21st, 2017:

I finded an old version of your script and i think the command works but i have thoses errors :

- When i use a r/o user : "UNKNOWN: Authentication Error"
- When i use root : CRITICAL: Execution time too long!

ck from Switzerland wrote on Jul 19th, 2017:

The easiest method is to use pip. See https://www.claudiokuenzler.com/blog/671/new-version-check_esxi_hardware-20161013-support-pywbem-0.9.x.

Ricou from wrote on Jul 19th, 2017:

OK, i have actually python 0.7.0.
How can install the version 0.8.x ?

ck from Switzerland wrote on Jul 19th, 2017:

Ricou, try a newer pywbem version (at least 0.8.x).

Ricou from wrote on Jul 19th, 2017:

Hi,

I have an issue when i launch the command :
./check_esxi_hardware.py -H $HOSTADDRESS$ -U $USER6$ -P $USER7$ -i "IPMI SEL" -V hp -t 45

It returns this error :
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 617, in
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), no_verification=True)
TypeError: __init__() got an unexpected keyword argument 'no_verification'

I have CentOS 6.5, python 2.6.6 and pywbem.
Have you an idea ?

Wouter from wrote on Jun 7th, 2017:

Robin, I think this will solve your problem: http://www.squidworks.net/2017/02/vmware-esxi-6-5-cim-data-disabled-by-default/
I had the same problem on 6.5.

ck from Switzerland wrote on Apr 28th, 2017:

Robin, please check out this list https://www.vmware.com/resources/compatibility/pdf/vi_cim_guide.pdf for a list of officially compatible VMware hardware with CIM providers.

Robin from Deutschland wrote on Apr 28th, 2017:

The Hardware Modle is Supermicro.
On which Server do you mean ? The Server with esxi Version 6.5 is a new Server, it weren't updated.

Claudio from Switzerland wrote on Apr 28th, 2017:

Vladimir, you could write a wrapper script around check_esxi_hardware and just grep for CPU. Or you fork the plugin and use only the CIM tables related to CPU temperature.

ck from Switzerland wrote on Apr 28th, 2017:

Robin, what kind of hardware model is this? Did you upgrade the hardware vendor's cim offline bundle, too? Also check out https://monitoring-portal.org/index.php?thread/39393-check-esxi-hardware-timeout-seit-version-6-0-0-build-4600944

Robin from Deutschland wrote on Apr 28th, 2017:

Hello,

i have a question.
The check_esxi_hardware.py works fine on esxi 5.X.
But now we have esxi 6.5, there the check didn't work.
Then i tried a newer Version of the Check ( 20161013) and updated the pywbem to 0.8.0
But now i get the following error on an esxi 6.5.

./check_esxi_hardware.py -H IP-Address -U root -P 'password' -v
20170428 14:03:34 Connection to https://172.18.50.220
20170428 14:03:34 Found pywbem version 0.8.0-dev
20170428 14:03:34 Check classe OMC_SMASHFirmwareIdentity
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 669, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_operations.py", line 1018, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_operations.py", line 592, in imethodcall
timeout=self.timeout)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_http.py", line 547, in wbem_request
raise ConnectionError("Socket error: %s" % exc)
pywbem.cim_http.ConnectionError: Socket error: [Errno 111] Connection refused

Can you help me ?
Thanks

Vladimir from wrote on Apr 20th, 2017:

Hello CK! How can i get CPU temperature value (or other HW) only in output with -v option?

ck from wrote on Dec 14th, 2016:

You only need the connection to the CIM port (default 5989). I will adapt the information. Here's a tcpdump I just did when the plugin is launched. As you can see, only the CIM port is accessed (no 443):

08:15:40.234718 IP monitoring.53568 > esxserver.5989: Flags [S], seq 2030707798, win 29200, options [mss 1460,sackOK,TS val 16733847 ecr 0,nop,wscale 7], length 0
08:15:40.237942 IP esxserver.5989 > monitoring.53568: Flags [S.], seq 2366951929, ack 2030707799, win 65535, options [mss 1460,nop,wscale 9,sackOK,TS val 854991276 ecr 16733847], length 0
08:15:40.237959 IP monitoring.53568 > esxserver.5989: Flags [.], ack 1, win 229, options [nop,nop,TS val 16733848 ecr 854991276], length 0
08:15:40.238176 IP monitoring.53568 > esxserver.5989: Flags [P.], seq 1:296, ack 1, win 229, options [nop,nop,TS val 16733848 ecr 854991276], length 295
08:15:40.248606 IP esxserver.5989 > monitoring.53568: Flags [P.], seq 1:1457, ack 296, win 130, options [nop,nop,TS val 854991277 ecr 16733848], length 1456
08:15:40.248625 IP monitoring.53568 > esxserver.5989: Flags [.], ack 1457, win 251, options [nop,nop,TS val 16733850 ecr 854991277], length 0
08:15:40.249236 IP monitoring.53568 > esxserver.5989: Flags [P.], seq 296:422, ack 1457, win 251, options [nop,nop,TS val 16733850 ecr 854991277], length 126
08:15:40.253590 IP esxserver.5989 > monitoring.53568: Flags [P.], seq 1457:1683, ack 422, win 130, options [nop,nop,TS val 854991277 ecr 16733850], length 226

Pap from Switzerland wrote on Dec 13th, 2016:

Hello ck,

thank you for you answer. I thought this would be the problem, because i get the following error:

(0, 'Socket error: [Errno 111] Connection refused')

And the faq told me to free port 443. So, i guess its some other problem. But thank you for the help and the awesome plugin.

Greetings

Patrick

ck from Switzerland wrote on Dec 13th, 2016:

Hello Pap. You actually only need to care about the CIM port. Port 443 will not be used.

Pap from Switzerland wrote on Dec 13th, 2016:

Hello,

maybe im stupid, but i need to forward port 443 and the CIM Port. But the problem is, there are multiple Hosts behind the NAT. Is there any possibilty to use another port and not 443? For instance externalIp:5001 which get mapped to internalIP:443? I only see an option to change the CIM port, not the HTTPS

Greetings

Patrick

ck from Switzerland wrote on Oct 27th, 2016:

Hi Adi, plugin runs fine on ESXi 6 serves, too. I'm successfully checking ESXi 6.0.0 build 4192238 servers. These are Cisco UCS servers.

Adi from Israel wrote on Oct 26th, 2016:

Does this plugin needs something extra to work on ESXi 6.0?
the esx image has been taken from HP site (https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI60U2-HPE&productId=491) but it stuck forever on:
Found pywbem version 0.9.0
Check classe OMC_SMASHFirmwareIdentity

same plugin works on 5.5

Lars from Germany wrote on Aug 26th, 2016:

Hi CK,
the update of pywbem did the trick. Thanks a lot. For those runnig SLES like me, to get the updated pywbem running i had to uninstall the pytho-pywbem from the distribution.

Thanks again for the ultrafast help

Lars

ck from Switzerland wrote on Aug 26th, 2016:

Hi Lars. Please check out http://www.claudiokuenzler.com/blog/542/installing-testing-pywbem-0.8-development-version where I documented how to install pywbem manually. Or make sure you have the "pip" program installed and then launch "pip install pywbem". That's by far the easiest and fastest way.

Lars from Germany wrote on Aug 26th, 2016:

Hi CK,

did restart of this services, did not change. My pywbem vesion is 0.7-6.22.1 .
Vmware did completely disable the SSLv3 with this update, i think this has to do with it, because on older version of ESXI 5.5 plugin still works fine. I have reenabled SSLv3 on service hostd (port 443) and sfcb (port 5989), but this does not help. Where can i get a newer version of pywbem, my nagios runs on SLES11 SP4, there are no newer packages in the repositories available.

ck from Switzerland wrote on Aug 26th, 2016:

Hi Lars. Try the following in this order:
- Restart CIM Server on the ESXi server
- Restart sfbc-watchdog
- Try a newer version of pywbem (which version are you currently using?)

Lars from Germany wrote on Aug 26th, 2016:

Hello All,

i have a question. After updating my ESXi Hhosts to 5.5 build 4179633 the plugin stopped working.
Output at command line:
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 665, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_operations.py", line 421, in EnumerateInstances
**params)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_operations.py", line 183, in imethodcall
no_verification = self.no_verification)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 266, in wbem_request
h.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 914, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 786, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 115, in send
self.connect()
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 163, in connect
if not check(self.sock.get_peer_cert(), self.host):
File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Checker.py", line 66, in __call__
raise NoCertificate('peer did not return certificate')
M2Crypto.SSL.Checker.NoCertificate: peer did not return certificate

Does anyone know how to fix it?
Thanks in advance

regards

Lars

Roland Sommer from wrote on Aug 23rd, 2016:

Yeah, that looks good now!

CRITICAL : HP Smart Array P812 Controller : Slot 3 : HPSA2 CRITICAL : HP Smart Array P812 Controller : Slot 3 : HPSA2

Many thanks!

ck from Switzerland wrote on Aug 23rd, 2016:

Sorry Roland, I meant "-V hp" (capital letter V).

Roland Sommer from wrote on Aug 23rd, 2016:

Hi ck, thanks for your answer. The element is shown in the vSphere hardware Tab and the failure is detected. Adding -v hp to the command definition reports:
20160823 08:27:29 Check classe VMware_Battery
20160823 08:27:29 Element Name = Battery on HPSA1
20160823 08:27:29 Element Name = Battery on HPSA2

but final message is OK and exit code is 0.

ck from Switzerland wrote on Aug 22nd, 2016:

Hi Roland, as you can see on the screenshot (http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php#Screenshots) problems with the RAID battery are also reported.
Make sure you have installed the newest vendor CIM bundle and that you use the correct hardware vendor in your command definition (-v hp in your case). Verify in vsphere hardware tab, if the element is shown.

Roland Sommer from wrote on Aug 22nd, 2016:

check_esxi_hardware.py (20160531) does not report failure of HW RAID cache battery. It's reported via CLI:

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show

Smart Array P812 in Slot 3 (sn: ***************)

CACHE STATUS PROBLEM DETECTED: The cache on this controller has a problem.
To prevent data loss, configuration changes to
this controller are not allowed.
Please replace the cache to be able to continue
to configure this controller.

and ist shown in the tab "Hardware Status" I the vSphere Client.

Is this a missing CIM element? Or does the script not handle this CIM element?

Claudio Kuenzler from Switzerland wrote on May 24th, 2016:

Dan, the mail address you left me was not working (Host or domain name not found. Name service error for name=cpdotomac.com type=A: Host not found). Please leave a working mail address in the form.

Dan from Maryland wrote on May 24th, 2016:

Just circling back with you, have not heard back!

ck from Switzerland wrote on Apr 28th, 2016:

Dan, let's continue the research off the comments. Please send me a mail (see contact form) and let's figure this out together. thanks.

Dan from Maryland wrote on Apr 28th, 2016:

I should have specified - I have already input that info and still get the same error. I did it again for good measure:

EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ESX1.redacted.com;ESXI Check Hardware;1461856963
Warning: Return code of 255 for check of service 'ESXI Check Hardware' on host 'ESX1.redacted.com' was out of bounds.
SERVICE NOTIFICATION: nagiosadmin;ESX1.redacted.com;ESXI Check Hardware;CRITICAL;notify-service-by-email;(Return code of 255 is out of bounds)

Claudio Kuenzler from Switzerland wrote on Apr 28th, 2016:

Can you run the plugin the exact same way as Nagios does it? Also run it as the nagios user, not as root.

Dan from Maryland wrote on Apr 28th, 2016:

ck from Switzerland wrote on Apr 28th, 2016:

Dan, you're missing some important variable declarations.

"command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ –P $ARG2$ dell;"
-> Here you miss the -V parameter before "dell"

"check_command check_esxi_hardware;"
-> Here you miss the actual arguments for user and password.

Dan from Maryland wrote on Apr 27th, 2016:

Hi Guys:
I am able to run the command from root (and nagios user) prompt via ssh:
./check_esxi_hardware.py -H 1.1.1.1 -U root -P Password

Works just fine, I get a response for all three of my esxi hosts. When I try to add the
Define command, and the define service, I get the same error as indicated below by Jetblack.

define command{
command_name check_esxi_hardware;
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ –P $ARG2$ dell;
}
define service{
use generic-service;
hostgroup_name hostgroup1;
service_description Hardware;
check_command check_esxi_hardware;
contact_groups +admins;
}

I have changed the ARGS from variables listed in resource. I have changed to nothing, and typing the user/password in the service defiinition as well. I also verified that the nagios user has permissions to the script.

The error I keep getting is:
Warning: Return code of 255 for check of service check esxi hardware

I've been beating my head about this for a couple days, figured I would reply and see what you may be able to do for me.

ck from Switzerland wrote on Apr 4th, 2016:

Hi Jerry. You can tell Nagios/Icinga to set the timeout for this command (in the command definition). The plugin itself runs as long as it has to (or it gets interrupted by something). For some servers I monitor I use a timeout of 120s/2min. Depending on the hardware and/or network connection this can differ.

Jerry from wrote on Apr 4th, 2016:

Is there a way to extend the time out?

ck from wrote on Feb 28th, 2016:

Hello Yury.
The warning comes from the element 'Memory':
20160228 13:19:21 Element Name = Memory
20160228 13:19:21 Element HealthState = 15
20160228 13:19:21 GLobal exit set to WARNING

It could either be a bug in the CIM implementation of this hardware element or it's not checked by the vSphere client. I suggest you run hardware diagnosis on this server (with HP SUM for example). If you don't find anything, you can ignore this element with "-i 'Memory'".

Yury from Belarus wrote on Feb 28th, 2016:

Hello ck!
Thank you for your response. I moved my nagios to the new server (Centos 6.7 x64). Here I have python 2.6.6 installed. I've installed pywbem 0.8. Now plugin is working, but returns warning:

./check_esxi_hardware.py -H 192.168.3.252 -U nagios -P MySecretPass
CRITICAL : Memory - Server: HP ProLiant DL360p Gen8 s/n: CZJ32102FW System BIOS: P71 2013-03-01

I've checked through iLO and VMWare Console, there are no errors.
I use VMware ESXi 5.1.0 build-1065491 (Update 1).
I've not found related information in FAQ.
Can you help me?

Here is full output:
20160228 13:19:16 Connection to https://192.168.3.252
20160228 13:19:16 Found pywbem version 0.8.0rc3
20160228 13:19:16 Check classe OMC_SMASHFirmwareIdentity
20160228 13:19:17 Element Name = System BIOS
20160228 13:19:17 VersionString = P71
20160228 13:19:17 Check classe CIM_Chassis
20160228 13:19:17 Element Name = Chassis
20160228 13:19:17 Manufacturer = HP
20160228 13:19:17 SerialNumber = CZJ32102FW
20160228 13:19:17 Model = ProLiant DL360p Gen8
20160228 13:19:17 Check classe CIM_Card
20160228 13:19:17 Check classe CIM_ComputerSystem
20160228 13:19:18 Element Name = System Board 7:1
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:2
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:3
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:4
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:5
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:6
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:7
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:8
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board

ck from Switzerland wrote on Feb 22nd, 2016:

Hello Yury. Please try it with the 0.8.0 version of pywbem and see if it works. CentOS applied some patches on their own which seems to break stuff in pywbem. Check this out for more info: http://www.claudiokuenzler.com/blog/542/installing-testing-pywbem-0.8-development-version

Yury from Belarus wrote on Feb 19th, 2016:

Hello! Thank you very much for this grate plugin and great work!
My ESXi is 5.1 on ProLiant DL360p Gen8.

I trying to use your plugin on my nagios server (Centos 5.9 i386).
Python version is:
# python -V
Python 2.7

Extension pywbem (v. 0.7.0) have been succesfully installed.

#which python
/usr/local/bin/python
I've pointed to this location in header of check_esxi_hardware.py.

Then I changed line 593 to:
"wbemclient = pywbem.WBEMConnection(hosturl, (user,password))" to avoid authentication error.

Now I have following error:
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 618, in
c=wbemclient.EnumerateInstances('CIM_Chassis')
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 168, in imethodcall
verify_callback = self.verify_callback)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_http.py", line 184, in wbem_request
h.putheader('Content-length', len(data))
File "/usr/local/lib/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, int found

How to fix the error?

Louis from South Africa wrote on Jan 11th, 2016:

It's at times like this when I feel truly embarrassed! After confirming with the guy who set up the boxes, he used a different password for root.

So it works communicating directly to ESXi 6.0.0 boxes with username root.

Thanks for the help!

ck from Switzerland wrote on Jan 7th, 2016:

When you connected the ESXi server into vsphere, you needed to enter the root password. Use root and the root password to see if it works. If it does, then you can try to make a less elevated user. I haven't had the chance yet to install ESXi/vsphere 6, so I can't tell whether check_esxi_hardware works on it or not.

Louis from South Africa wrote on Jan 7th, 2016:

We have two hosts, managed by VSphere on a Windows server. VSphere lists a few LOCAL users (one of which is Administrator) but I can't find where to change this list now. All other users now seem to be AD users.

As a side note: not matter what username or password I use, I get the same messages as below.

DOES the plugin work on ESXi 6.0.0? I saw a comment on the exchange.nagios.org page that said 6 was not supported.

ck from Switzerland wrote on Jan 7th, 2016:

Hi Louis, I have actually never tried to authenticate with an AD-user, so I don't know if it works or if it is supposed to work. Have you already tried it with a "local" user, meaning the user exists in /etc/passwd on the target ESXi server?

Louis from South Africa wrote on Jan 7th, 2016:

(I missed an important line in my previous post - please ignore it and use this one instead.)

Running on RedHat 7.2, fully patched, with
python-2.7.5-34.el7.x86_64
pywbem-0.7.0-25.20130827svn625.el7.noarch

When I run the command against VMWare 6.0.0 (with VSphere integrated into Active Directory) it fails even though the password is correct:

# ./check_esxi_hardware.py -H 172.25.2.13 -U ADdomain\\ADuser -P "ADpassword" -V hp -v

20160107 16:54:35 Connection to https://172.25.2.13
20160107 16:54:35 Found pywbem version 0.7.0
20160107 16:54:35 Connection error, disable SSL certification verification (probably patched pywbem)
20160107 16:54:35 Check classe OMC_SMASHFirmwareIdentity
20160107 16:54:37 Global exit set to UNKNOWN
UNKNOWN: Authentication Error

Any thoughts?

ck from St. Gallen, Switzerland wrote on Nov 17th, 2015:

Hi Craig. Try it with newer pywbem 0.8.0.

Craig Hart from Australia wrote on Nov 17th, 2015:

windows os
python 2.7.10
pywbem 0.7.0

run script returns the error:

Traceback (most recent call last):
File "C:\MyTechAgent\ESXi_HardwareStatus\check_esxi_hardware.py", line 593, in

wbemclient = pywbem.WBEMConnection(hosturl, (user,password), no_verification
=True)
TypeError: __init__() got an unexpected keyword argument 'no_verification'

If I edit line 594 and change to wbemclient = pywbem.WBEMConnection(hosturl, (user,password)) it works.

So, there seems to be some issue with the if-then-else logic around the 0.7.0 connection tests ? it's using the version "with" no_verification=True when it shouldn't. ??

ck from Wil, Switzerland wrote on Aug 31st, 2015:

Hi Frank, actually you can trust both information. vSphere client reads out the CIM HealthState while the plugin currently reads the CIM OperationalState for every server except HP. So this is where the difference comes from. So your RAID Controller (Controller 5003005700C3BEA0) actually has a non-OK operational state which might indicate a hardware failure or (what I've seen with Dell servers) a bug in the firmware. You can switch to -V hp if you want to have the same "view" as in vSphere client.

Frank Wein from wrote on Aug 31st, 2015:

What info should I trust when there is a contradiction between the hardware status in vSphere client and this script? Recently I got this error on a server (a few days after I started using this script):
CRITICAL : Controller 5003005700C3BEA0 (RAID Ctrl SAS 6G 5/6 512MB (D2616)) CRITICAL : Controller 5003005700C3BEA0 (RAID Ctrl SAS 6G 5/6 512MB (D2616)) - Server: FUJITSU PRIMERGY RX300 S6 s/n: ... System BIOS: 6.00 Rev. 1.07.2619.N1 2010-08-16

Jim Caldwell from wrote on Jun 5th, 2015:

Will this script ever be updated to work with Python 3? It gives errors starting on line 440 related to the print statement. I'm using FreeBSD 10 and I don't think 2.7 will ever get the pywbem update to fix the SSL problem, which makes this plugin useless to me now. I miss it greatly.

Duncan Carter from England wrote on Mar 30th, 2015:

Hi,
Having the same 'CRITICAL : Memory - Server: HP ProLiant DL120 G7' error that others have seen, I've gone through the BIOS with a fine tooth comb and have reset all the logs I can find, any suggestions?
Many thanks in advance,

ck from Switzerland wrote on Feb 3rd, 2015:

Concerning the "EOF occurred in violation of protocol" error: I think it has something to do that the connection is cut before the plugin can run through everything. For debugging, you should run the plugin regularly with a cronjob, saving the verbose output in a file. When you see that error appearing in Nagios, you analyze the log file if the same issue happened when the plugin was launched by cron. This may give you a hint if there was a timeout or if there was a problem with the connection.

Julian from España wrote on Feb 3rd, 2015:

Hi Claudio,

I'm afraid that I'm suffering the same problems that James and Jan:

CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")

Mi Command is this:

check_esxi_hardware.py -H X.X.X.X -U xxx -P pppp -V hp

When I executed it in verbose mode I get this:
20150203 10:19:48 Connection to https://X.X.X.X
20150203 10:19:48 Check classe OMC_SMASHFirmwareIdentity
20150203 10:19:48 Element Name = System BIOS
20150203 10:19:48 VersionString = P70
20150203 10:19:48 Check classe CIM_Chassis
20150203 10:19:49 Element Name = Chassis
20150203 10:19:49 Manufacturer = HP
20150203 10:19:49 SerialNumber = CZ2334234
20150203 10:19:49 Model = ProLiant DL360p Gen8
20150203 10:19:49 Check classe CIM_Card
20150203 10:19:49 Check classe CIM_ComputerSystem
CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")

The issue is that I have a similar machine in a cluster and the other one never fails. Would it be something about VMware configuration?

Thank you in advance!!

Jan from wrote on Jan 28th, 2015:

Hello,
I have the same issue as James:
CRITICAL: (0, Socket error: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol)
on HP DL380 G8 working with ESX 5.5

Monitoring worked for month, but suddenly it started throwing this failure.
Is there any solution ?

Thanks

sacke from Sverige wrote on Nov 27th, 2014:

Thanks for great help regarding HP array monitoring.

Seemed that if i Specifiec -V hp, then it works correctly, but if i dont specify vendor, then it would only detect broken raid volumes on HP, not broken disks.

// Stefan

ck from Switzerland wrote on Nov 27th, 2014:

Hi sacke. The important part is the element status code the CIM element returns. If the number is another than 5 (for HP/HealthState) then you can use this to be alerted, too. See the following table (its part of the plugin):

0 : ExitOK, # Unknown5 : ExitOK, # OK10 : ExitWarning, # Degraded15 : ExitWarning, # Minor20 : ExitCritical, # Major25 : ExitCritical, # Critical30 : ExitCritical, # Non-recoverable Error

sacke from Sverige wrote on Nov 27th, 2014:

Hello.
Running the script against HP servers.
Works great, but only alerts on failed disks in logical volumes.
Have a raid setup with spares, and when one disk fails, the script alerts.
But since we have spare disks, the script stops alerting when the raid has rebuild.
Wont alert on broken disk not included in raid volume.
Doesnt either alert on "predictive failures" on disks.

If i run the script -v then i see the disks as unconfigured disk: Predictive Failure
Element Name = Disk 4 on HPSA2 : Port 1E Box 1 Bay 4 : 68GB : Unconfigured Disk : Predictive Failure

Can i adjust the script to alert on this aswell ?

ck from Switzerland wrote on Oct 15th, 2014:

Hi James. I got some e-mails concerning the same error you describe, but unfortunately I did not get the final solution as a follow-up from these users. I highly suspect a network connectivity issue as source of the error. To be proved/proved wrong :)

james from Belgium wrote on Oct 15th, 2014:

Hi Claudio,

When I run check_esxi_hardware.py , I get sometimes the message "CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")"
and other times it run well ?

ck from Switzerland wrote on Oct 5th, 2014:

Hello Jetblack, please post you commands and service definition. Maybe there's a problem in there or some incompatibility. If you prefer, you can also send the definitions directly to my mail.

Jetblack from London wrote on Oct 3rd, 2014:

Hi Claudio,

Thanks for the answer. I have disabled the verbose mode in definitions and tried all possible options. I manage to run the command properly under my root and my nagios user. The result of the echo $? command is a 0 which should get properly accepted by nagios. Some forum answers (about the same error but with NRPE) seem to point at an access rights problem on the memory sector/file/location where the output (0/1/2/3 in our case for Nagios) is stored/sent.
I can't find anything relevant as to where that should be, but it's probably because I have only very basic knowledge of standard linux/python command outputs.
Do you have an idea ?

Thanks.

Claudio Kuenzler from Switzerland wrote on Oct 1st, 2014:

Hi Jetblack, Maybe you have configured the -v (verbose) parameter into your command or service definition?
Thatd explain it.

Jetblack from London wrote on Oct 1st, 2014:

Hi,

This script is absolutely amazing. I am trying to make it work with a Nagios 4.0.8 test environment.
I am running it fine in command line and get the expected return.
However in Nagios I get a return code of 255, probably a return code that is too long.
I have little to no knowledge in programming in Python and am wondering how I could trim this return code to fit my very basic Nagios.

Thanks.

AV from Paris/France wrote on Sep 9th, 2014:

Thank you for your answer but I had already tried that without any success

ck from Switzerland wrote on Sep 9th, 2014:

Hi AV, it's written above in the FAQ: You might have to restart the sfcbd-watchdog.

AV from Paris/France wrote on Sep 9th, 2014:

Hi,
I am trying to get this plugin working.
I get a timeout error when I try to run it:
[root@nagios libexec]# ./check_esxi_hardware.py -H xx.xx.xx.xx -U root -P xxxxxx -V hp -v
20140909 15:52:59 Connection to https://xx.xx.xx.xx
20140909 15:52:59 Check classe OMC_SMASHFirmwareIdentity
CRITICAL: (0, "Socket error: (110, 'Connection timed out')")

It does not get any further than this ..
and I got the same result when I put a wrong password. as if it didn't even get to the point where it checks the user/password

Thank you for any help

AV

ck from Wil, Switzerland wrote on Sep 2nd, 2014:

Hello Oliver,
Thanks so much for that hint! I was looking for a way to disable the certificate validation, but my research led into nirvana. I will take your code into the next release after testing. Thanks again!

oliver from Germany wrote on Sep 2nd, 2014:

Hello,
i had issues with "Unknown CIM Error: (0, 'SSL error: certificate verify failed') on CentOS. After a little bit of investigating i changed line 561 to:

# connection to host
verboseoutput("Connection to "+hosturl)
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), NS, no_verification=True)

Without the verification checks, everything works smooth.

Regards Oliver

Chris B from Germany wrote on Aug 25th, 2014:

Hi Martin,

sry was on Vacation.
Check this screen: http://www.evernote.com/shard/s221/sh/479a7f2d-e9b0-4d79-9ade-5e6467c98720/eb7f7522f47b68f769fab1ed728e540a

Martin.N from Germany wrote on Jul 25th, 2014:

I'm also getting "UNKNOWN: Authentication Error" while checking ESXi 5.5.

@Chris B
What checkbox do you mean?

Chris B from Germany wrote on Jul 18th, 2014:

Hi,

we are using this plugin for years with a esxi 4.1 Supermicro Server. Works good, thx.

Yesterday, I got a new machine and installed it with esxi 5.5. -> UNKNOWN: Authentication Error
Does the script work on 5.5 ?

chris from wrote on Jul 17th, 2014:

Hi,

When can whe use the plugin with the pywbem-0.7.0-25

ck from Switzerland wrote on Apr 8th, 2014:

Hi Jon, can you run the verbose option to see where the script stops and send me the full output (by mail would be best). Thanks.

Jon Tan from Los Angeles wrote on Apr 8th, 2014:

Thanks for creating this check. I currently get the error below. Any help would be appreciated. Thank you.

/usr/lib/python2.6/site-packages/pywbem/cim_types.py:39: DeprecationWarning: object.__init__() takes no parameters
int.__init__(self, arg, base)
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 662, in
sensorType = instance[u'sensorType']
File "/usr/lib/python2.6/site-packages/pywbem/cim_obj.py", line 620, in __getitem__
def __getitem__(self, key): return self.properties[key].value
KeyError: u'sensorType'

Shannon Young from Canada wrote on Mar 24th, 2014:

I thought it may have been pnp4nagios causing it so I've disabled all performance collection against the service.

Here's my service definition:
define service {
host_name esx-hostname
service_description Hardware
check_command check_esxi_hardware
max_check_attempts 3
check_interval 5
retry_interval 1
notification_interval 30
check_period 24x7
}

I've tried to make my check as simple as possible with the unfortunate same result :(

(No output on stdout) stderr: ENV: 'NAGIOS_HOSTNAME'='esx-hostname'

When looking at the status information against the service I see there's a bunch of data there (not sure if this will help).

I can run it as the nagios user without any troubles:

[nagios@nagios01 ~]$ /usr/local/nagios/libexec/check_esxi_hardware.py -H 10.9.22.99 -U root -P 'mypa$$word' -V dell
OK - Server: Dell Inc. PowerEdge 2950 s/n: XXXXXXX System BIOS: 2.6.1 2009-04-20

ck from Switzerland wrote on Mar 22nd, 2014:

Hi Shannon. How does your service definition look? Also make sure, that you can run the plugin with the exact same parameters as your nagios user (not root) on your nagios server.

Shannon Young from Canada wrote on Mar 21st, 2014:

I recently upgraded my release to 4.0.4 and came across your plugin. I installed without any issues and can run the script no worries. However once I configure Nagios to use it I now get a Critical:
(No output on stdout) stderr: ENV: 'NAGIOS_HOSTNAME'='esxserver' with a whole bunch of variables. Hoping it's not an issue with Nagios 4.0.4 as it doesn't seem to work :( thoughts?

This is what I have defined in commands

define command{
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U root -P 'mypa$$word' -V dell
}

Any help greatly appreciated and happy to provide further information if needed.

kornflex from wrote on Mar 5th, 2014:

I was wrong but I was in a good directory.

I found my error :
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

has to be :
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ $ARG3$
}

$ARG3$ could be : -i "ignore word"
But I can't define it empty like this : -i or -i "" or -i ''

I have to put all in argument :/

ck from Switzerland wrote on Mar 5th, 2014:

You have just answered your own question now:

"where $USER1$=/usr/local/nagios/libexec ( resource.cfg )" vs. "The file is present : /usr/lib/nagios/plugins/check_esxi_hardware.py with permissions"

kornflex from wrote on Mar 5th, 2014:

It works with su - nagios

commands.cfg :
define command {
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

where $USER1$=/usr/local/nagios/libexec ( resource.cfg )

services.cfg

define service {
service_description check_esxi_hardware
check_command check_esxi_hardware!root!PASSWD!""
host_name pcbu-vm2
check_period 24x7
notification_period 24x7
contact_groups admins
event_handler_enabled 0
notification_interval 1440
notification_options w,u,c,r
max_check_attempts 5
check_interval 10
retry_interval 2
use notification_default_24h
}

ck from Switzerland wrote on Mar 5th, 2014:

Make sure you use "su - nagios", not just "su nagios" to change the environment, too. This is where it could be failing. If that's not it, can you show the service definition?

kornflex from wrote on Mar 5th, 2014:

ps aux | grep nagios :
nagios 28610 0.0 0.0 6188 1524 ? SNs 15:16 0:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg

So with nagios user account ( su nagios ) :
./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD

All is OK

ck from Switzerland wrote on Mar 5th, 2014:

Did you run "./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD" as root or as the user under which your nagios installation is running? Make sure you become the correct user first. Assuming your nagios installation runs under the user "nagios", do "su - nagios" and then execute the plugin again.

kornflex from wrote on Mar 5th, 2014:

Hi,

I use $USER1$ in my commands.cfg. I 've just replace the string with the correct value :)

I can launch in command line:
./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD

The result :
OK - Server: Dell Inc. PowerEdge T110 II s/n: C5SJ95J System BIOS: 1.2.4 2011-09-19

ck from Switzerland wrote on Mar 5th, 2014:

Hi kornflex. Make sure you can execute the plugin on the command line.
In the commands definition usually you use $USER1$/check_esxi_hardware.py as path.
It is also possible that the output of the plugin is too big to be handled by nagios. In this case manually launch the plugin to verify the output.

kornflex from wrote on Mar 5th, 2014:

Hi,

No problem in command line, but Return code of 127 is out of bounds - plugin may be missing in ngios webGUI :/

commands.cfg :
define command {
command_name check_esxi_hardware
command_line /usr/lib/nagios/plugins/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

The file is present : /usr/lib/nagios/plugins/check_esxi_hardware.py with permissions :
rwxr-xr-x 1 root root 31654 24 avril 2013 check_esxi_hardware.py

I can launch the command line as nagios or www-data or root.

Can you help me ?

Thanks

dmitrylnx from At work where else i can be :) wrote on Feb 8th, 2014:

There is a way to make the script to connect remote hosts with port forwarding.
1. Make a copy of the original script and rename it to check_esxi_remote.py or whatever.
2. Edit line 509 of the new script
from hosturl = 'https://' + hostname
to hosturl = 'https://' + hostname + ':7443'
where 7443 is the NAT port from witch you redirect to your ESXi host. In my case:
WAN:7443->LAN:esxi:443
Now if you append https:// before remote host IP in the new script, it will automatically add :7443 to the destination address. Works great for me one script for local, one for remote.
example: ./check_esxi_remote.py -H https://remote-ip -U root -P somepasssword

Thank you very much for this script , really great tool.

Claudio from Switzerland wrote on Dec 19th, 2013:

Try with giving the port 444 in the hostname like this:

/check_esxi_hardware.py -H xxx.xxx.xxx.xxx:444 -U root -P mypass -V hp

This is untested however so I'm not sure if it will work or not.

Gijsbert from Netherlands wrote on Dec 19th, 2013:

Hi all, I did a portforwarding in the router from 444 to 443 and can connect with the vsphere client to ip:444 but when i do a check_esxi_hardware.py ip:444 i get errors. When i try a working server on 443 and do check_esxi_hardware.py ip:443 is gives the same error's so i think te script doesn't understand the :44x addition or I don't understand how to do it.

Claudio from Switzerland wrote on Dec 10th, 2013:

Hi Jake. You can't change the CIM port 5989 but you can change the management port (443) to a different one. Please see http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1021199 for a howto.

Jake from Kansas City wrote on Dec 10th, 2013:

Is there any way to configure check_esxi_hardware.py to use non-default ports? I'm monitoring some remote ESXi servers behind a firewall with only one static IP and have to use port-forwarding to permit access. I can't set up both ESXi servers with the same port on the firewall. Thanks!

Claudio from Switzerland wrote on Nov 13th, 2013:

Hello Shawn. Make sure the CIMProviders are activated in your ESXi (check Advanced Software Settings in vSphere client).

Shawn from Germany wrote on Nov 12th, 2013:

Hello,

I am trying to run this from the command line from Nagios. I am receiving (0, 'Socket error: [Errno 111] Connection refused'). The monitoring server and the ESX server are on the same LAN, so no firewall is blocking the connection. I checked and CIM is allowing both port 443 and 5989. Any ideas?

./check_esxi_hardware.py -H esx_ip -U root -P password

Claudio from Switzerland wrote on Nov 7th, 2013:

Hi Mike. Run the plugin as nagios (or whatever user you use for your monitoring application). You have to verify that all permissions and paths are correct.

Mike from USA, Erie, Pennsylvania wrote on Nov 7th, 2013:

I can run this fine from command line, but when nagios runs the service check, I get the following error:

(No output on stdout) stderr: Traceback (most recent call last):
File "/usr/local/nagios/libexec/check_esxi_hardware.py", line 543, in
getopts()
File "/usr/local/nagios/libexec/check_esxi_hardware.py", line 529, in getopts
filename = open(filextract, 'r')
IOError: [Errno 2] No such file or directory: ':/usr/local/nagios/sbin/.esxipass'

Any help is greatly appreciated.

Claudio from Switzerland wrote on Sep 5th, 2013:

Hello Matt. I do not know how Centreon handles performance data, but the plugin returns all perf data in one return. Example:

OK - Server: Supermicro X8SIE s/n: 0123456789 System BIOS: 1.0c 2010-05-27|P2Vol_0_System_Board_34_CPU_Vcore=0.87;1.34;1.4 P2Vol_1_System_Board_35_+3.3VCC=3.29;3.58;3.64 P2Vol_2_System_Board_35_+3.3VSB=3.29;3.58;3.64 P2Vol_3_System_Board_35_AVCC=3.29;3.58;3.64 P2Vol_4_System_Board_35_VBAT=3.15;3.58;3.64 P2Vol_5_System_Board_37_+12_V=12.03;13.09;13.19 P2Vol_6_System_Board_38_CPU_DIMM=1.55;1.76;1.77 P2Vol_7_System_Board_39_+5_V=5.05;5.34;5.6 P2Vol_8_System_Board_40_-12_V=-12.29;-11.71;-11.51 P4Tem_0_System_Board_28_System_Temp=39;75;77 P5Fan_0_System_Board_31_FAN_2=8725;29260;29815 P5Fan_1_System_Board_31_FAN_3=9280;29260;29815 P5Fan_2_System_Board_31_FAN_4=8725;29260;29815 P5Fan_3_System_Board_32_FAN_1=9280;29260;29815

So it is up to your graphing tool to handle the performance data all in one. I personally use nagiosgraph for such graphing but pnp4nagios is also capable of creating multiple graphs from one perfdata output.

Matt from China wrote on Sep 5th, 2013:

Hi Claudio,

Thank you a lot for this script. We can monitor all of our ESX servers in a reliable way.

But I've one question.

Our current ESX version is 5.1 on Dell servers (R610, R620) and we're getting more and more servers everyday.

We also use Centreon with nagios to poll servers.

In order to graph clearly each type of element, I've defined different probes (FAN, voltage, current, etc...) but each probe executes check_esxi_hardware.py script every five minutes then the poller server gets overloaded (long delay to get feedback from the script).

Is there any way to get all information in one time for each server and grep useful information through different probes ?

Thank you in advance.

Claudio from Switzerland wrote on Aug 8th, 2013:

Martin, you need to define at least a host and a service using this defined host. Check out the official Nagios documentation.

Martin from Netherland wrote on Aug 8th, 2013:

Thank you for this plug-in,
Works for me, I can do the check etc.
but I need a little help :\'(

I\'m a Newbie at nagios, so I put the command definition(s) in command.cfg, that was easy.
I created a new Cfg file and put it under object/VMware.cfg.
I defined the VMware.cfg in Nagios.cfg as a config file.

And I put the \"service check\" in the VMware.cfg file.
But when I reload Nagios, it gives me an Config error!

Do I need to define a \"Host\" in my Vmware.cfg file?
Hope you can help, much appreciated

Claudio from Switzerland wrote on Aug 5th, 2013:

The plugin "requires" that you run ESX/ESXi and that the monitoring server has pyhton with the pywbem extension installed (see requirements). It doesn't matter if you use the licensed or free ESXi version. The hardware agents are optional, however I suggest you install the so-called Offline Bundles which you can find on the HP and Dell websites.

Sarita Gupta from India wrote on Aug 5th, 2013:

What are the pre-requisites for this plugin? Does it monitor both the free as well as the licensed ESX? Does it require installation of the hardware agents (like dell OMSA, HP SIM etc)?