Header RSS Feed
 
If you only want to see the articles of a certain category, please click on the desired category below:
ALL Android Backup BSD Database Hacks Hardware Internet Linux Mail MySQL Monitoring Network Personal PHP Proxy Shell Solaris Unix Virtualization VMware Windows Wyse

Nagios/Monitoring Plugin check_esxi_hardware FAQ
Wednesday - Dec 5th 2012 - by - (112 comments)

Since the initial release of the Nagios/Icinga/Monitoring plugin check_esxi_hardware back in August 2008 by David Ligeret (at that time under the name check_esx_wbem) the script has been downloaded several thousand times and many people have since then worked to improve the script.

As there are often questions about the plugin, most of them hardware related questions, I think it's useful to have a Frequently Asked Questions overview. The question you're burning to ask was probably asked at least once. But note: This FAQ does not replace the documentation. You still need to read that to understand and correctly use the plugin. ;-)

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: The plugin is executed fine but then it hangs on a certain CIM element on a DELL server. Verbose output:

20111104 18:21:56 Connection to https://esxi-001
20111104 18:21:56 Check classe OMC_SMASHFirmwareIdentity
20111104 18:21:57   Element Name = System BIOS
20111104 18:21:57     VersionString = 1.3.6
20111104 18:21:57 Check classe CIM_Chassis
20111104 18:21:57   Element Name = Chassis
20111104 18:21:57     Manufacturer = Dell Inc.
20111104 18:21:57     SerialNumber = xxxxx
20111104 18:21:57     Model = PowerEdge R710
20111104 18:21:57     Element Op Status = 0
20111104 18:21:57 Check classe CIM_Card
20111104 18:21:58   Element Name = unknown
20111104 18:21:58     Element Op Status = 0
20111104 18:21:58 Check classe CIM_ComputerSystem
CRITICAL: Execution time too long!

A: According to user feedback and lots of tests, such problems are related to the Dell OMSA Offline Bundle. Especially version 6.5 made problems on ESXi 5.x servers.

--- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I get the following error message from the plugin:

(0, 'Socket error: [Errno 111] Connection refused')

A: Make sure the Monitoring Server is able to access tcp port 5989 (cim) on the ESX(i) server. Alternatively you can also set a different port with the -C parameter if you have a special DNAT or port forwarding in place.

--- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: How do I use the -i parameter to ignore certain alarms?

A: As written in the documentation, the -i parameter awaits a comma separated list of elements to ignore. The "tricky" part is to find the correct element names (they can be pretty long sometimes). Run the plugin in verbose mode to have a list of all CIM elements. Here's an example how to ignore several elements:

./check_esxi_hardware.py -H myesxi -U root -P mypass -V dell -i "IPMI SEL","Power Supply 2 Status 0: Failure status","System Board 1 Riser Config Err 0: Config Error"

--- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: I have the following warning showing up but my server shows all sensors green:

WARNING : System Board 1 Riser Config Err 0: Connected - Server: Dell Inc. PowerEdge R620 s/n: xxxxxxx System BIOS: 1.1.2 2012-03-08  

A: It seems that all Dell PowerEdge x620 servers are affected, it looks like a BMC firmware bug to me. A workaround for this bugger is in place since version 20121027. Please check this post for detailed information.

--- --- --- --- --- --- --- --- --- --- --- --- --- ---

Q: The plugin returns the following outpout:

Authentication Error! - Server:

A: There are several answers to that:
1. Make sure you are either using the ESXi root user or that you create a user which is member of the root group. See this post for a short description how to do that.
2. The password you are using has some special characters like a question mark and you need to quote them.
3. The password you are using has a Dollar sign ($) which you need to single-quote.

Generally, always put quotes around your password as this assures the content is handled as string.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: Can the plugin also monitor other stuff like VMFS disk usage?

A: No. The plugin makes use of the CIM (Common Infrastructure Model) API. The so-called CIM elements cover hardware only.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: Some hardware is not being monitored by the plugin.

A: The plugin can only monitor the hardware which is "shown" by the server via the CIM API. If the hardware vendor does not include a certain hardware element into the CIM elements, then this piece of hardware can not be monitored. In all the years I've only seen this on no-name machines (and SUN) though.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: The plugin is so slow that a timeout occurs. 

A: In such cases always verify how the behavior is on your vSphere client in the Hardware tab. Click on the "Update" link and then "Refresh". Are they fast or do they also take a long time to update?
In ESXi 5.0 Update 1, a bug was causing slow hardware discovery/checks. See this article for more information. This was quite an annoying bug and I got bombed with e-mails... -_-

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: The plugin suddenly times out, but it was working fine before. The plugin returns the following output:

UNKNOWN: (0, 'Socket error: [Errno 110] Connection timed out')

A: In rare cases it is possible, that the sfcbd-watchdog service, running on the ESXi server, isn't working correctly anymore. Follow VMware KB entry 1013080 and restart the service by logging into the ESXi server by ssh and launch the following command:

/etc/init.d/sfcbd-watchdog restart

If this still doesn't resolve your issue, a manual restart of the "CIM Server" could help. This option is found under the "Configuration" tab -> "Security Profile". Click on "Service ... Properties".

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: After an update of the pywbem package the plugin doesnt work anymore. The following output is shown in verbose mode:

Unknown CIM Error: (0, 'SSL error: certificate verify failed')

A: This was seen in SLES 11 SP3 after an update of the package python-pywbem from 0.7-6.13 to 0.7-6.22. After reverting to the older version, the plugin worked again.

Update September 9th 2014: This error will be fixed in a future release of check_esxi_hardware.py, but it depends on the release of the new pywbem upstream version.
See https://github.com/Napsty/check_esxi_hardware/issues/7.

Update June 26th 2015: This issue was fixed in version 20150626.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: On an IBM server with the ESXi image from IBM the following error appears but works fine with the regular image vom VMware:

Traceback (most recent call last):
  File "./check_esxi_hardware.py", line 625, in
    verboseoutput("  Element Name = "+elementName)
TypeError: cannot concatenate 'str' and 'NoneType' objects

A: The CIM definition coming from the IBM image seems to be lacking some information. Version 20150119 fixes this issue.  

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: I updated my Ubuntu 14.04 and pywbem package 0.7.0-4ubuntu1~14.04.1 was installed. Since then I get the following error when the plugin is run:

Traceback (most recent call last):
  File "/usr/local/bin/check_esxi_hardware.py", line 619, in
    instance_list = wbemclient.EnumerateInstances(classe)
  File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 421, in EnumerateInstances
    **params)
  File "/usr/lib/pymodules/python2.7/pywbem/cim_operations.py", line 183, in imethodcall
    no_verification = self.no_verification)
  File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 268, in wbem_request
    h.endheaders()
  File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
    self.send(msg)
  File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 115, in send
    self.connect()
  File "/usr/lib/pymodules/python2.7/pywbem/cim_http.py", line 167, in connect
    except ( Err.SSLError, SSL.SSLError, SSL.SSLTimeoutError
AttributeError: 'module' object has no attribute 'SSLTimeoutError'

A: It seems that Ubuntu did the same as SUSE, RedHat and Centos in the past: The pywbem was patched without changing the upstream version number. This goes into the same direction as issue #7 (https://github.com/Napsty/check_esxi_hardware/issues/7). A temporary fix is to manually install the older pywbem package like this:

aptitude install python-pywbem=0.7.0-4

Update June 26th 2015: This issue was fixed in version 20150626.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: I use python3 but the plugin throws an error:

  File "./check_esxi_hardware.py3", line 440
    print "%s %s" % (time.strftime("%Y%m%d %H:%M:%S"), message)
                ^
SyntaxError: invalid syntax

A: An issue was opened on github (https://github.com/Napsty/check_esxi_hardware/issues/13) to address this compatibility issue.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: I sometimes get the following error on an ESXi host:

 CRITICAL: (0, 'Socket error: [Errno 8] _ssl.c:510: EOF occurred in violation of protocol')

A: After a lot of debugging and testing with a plugin user we came to the conclusion, that this problem arises from the ESXi host, not the plugin.
A tcpdump revelealed, that the ESXi host sent a TCP Reset packet rather then starting to submit data. A reboot of the affected ESXi host resolved the problem.

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: I have several ESXi hosts behind the same IP (NAT). How can I use the check_esxi_hardware?

A: Since version 20160531 it is possible to manually define the CIM port (which defaults to 5989). So if you set up port forwarding (DNAT) you can now monitor all ESXi servers behind the same NAT-address. The parameter you want in this case is "-C" (or --cimport).

--- --- --- --- --- --- --- --- --- --- --- --- --- --- 

Q: Is the plugin compatible with ESXi 6.x?

A: Yes.

 

Add a comment

Show form to leave a comment

Comments (newest first):

ck from Switzerland wrote on Jul 27th, 2017:
Ricou, if possible use the latest version and update your pywbem. If for whatever reason you must use an old version of check_esxi_hardware, you have to give it up to 2 minutes to run the hardware checks. It depends on the server model you want to check but I have seen very long checks (up to 2 min) on HP DL380 Gen8 servers. So increase the timeout in the Nagios settings.
To find out if the plugin hangs somewhere, use the "-v" switch for verbose mode.

Ricou wrote on Jul 21st, 2017:
I finded an old version of your script and i think the command works but i have thoses errors :

- When i use a r/o user : "UNKNOWN: Authentication Error"
- When i use root : CRITICAL: Execution time too long!

ck from Switzerland wrote on Jul 19th, 2017:
The easiest method is to use pip. See https://www.claudiokuenzler.com/blog/671/new-version-check_esxi_hardware-20161013-support-pywbem-0.9.x.

Ricou wrote on Jul 19th, 2017:
OK, i have actually python 0.7.0.
How can install the version 0.8.x ?

ck from Switzerland wrote on Jul 19th, 2017:
Ricou, try a newer pywbem version (at least 0.8.x).

Ricou wrote on Jul 19th, 2017:
Hi,

I have an issue when i launch the command :
./check_esxi_hardware.py -H $HOSTADDRESS$ -U $USER6$ -P $USER7$ -i "IPMI SEL" -V hp -t 45

It returns this error :
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 617, in
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), no_verification=True)
TypeError: __init__() got an unexpected keyword argument 'no_verification'

I have CentOS 6.5, python 2.6.6 and pywbem.
Have you an idea ?

Wouter from - wrote on Jun 7th, 2017:
Robin, I think this will solve your problem: http://www.squidworks.net/2017/02/vmware-esxi-6-5-cim-data-disabled-by-default/
I had the same problem on 6.5.

ck from Switzerland wrote on Apr 28th, 2017:
Robin, please check out this list https://www.vmware.com/resources/compatibility/pdf/vi_cim_guide.pdf for a list of officially compatible VMware hardware with CIM providers.

Robin from Deutschland wrote on Apr 28th, 2017:
The Hardware Modle is Supermicro.
On which Server do you mean ? The Server with esxi Version 6.5 is a new Server, it weren't updated.

Claudio from Switzerland wrote on Apr 28th, 2017:
Vladimir, you could write a wrapper script around check_esxi_hardware and just grep for CPU. Or you fork the plugin and use only the CIM tables related to CPU temperature.

ck from Switzerland wrote on Apr 28th, 2017:
Robin, what kind of hardware model is this? Did you upgrade the hardware vendor's cim offline bundle, too? Also check out https://monitoring-portal.org/index.php?thread/39393-check-esxi-hardware-timeout-seit-version-6-0-0-build-4600944

Robin from Deutschland wrote on Apr 28th, 2017:
Hello,

i have a question.
The check_esxi_hardware.py works fine on esxi 5.X.
But now we have esxi 6.5, there the check didn't work.
Then i tried a newer Version of the Check ( 20161013) and updated the pywbem to 0.8.0
But now i get the following error on an esxi 6.5.

./check_esxi_hardware.py -H IP-Address -U root -P 'password' -v
20170428 14:03:34 Connection to https://172.18.50.220
20170428 14:03:34 Found pywbem version 0.8.0-dev
20170428 14:03:34 Check classe OMC_SMASHFirmwareIdentity
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 669, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_operations.py", line 1018, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_operations.py", line 592, in imethodcall
timeout=self.timeout)
File "/usr/local/lib/python2.7/dist-packages/pywbem/cim_http.py", line 547, in wbem_request
raise ConnectionError("Socket error: %s" % exc)
pywbem.cim_http.ConnectionError: Socket error: [Errno 111] Connection refused

Can you help me ?
Thanks

Vladimir wrote on Apr 20th, 2017:
Hello CK! How can i get CPU temperature value (or other HW) only in output with -v option?

ck wrote on Dec 14th, 2016:
You only need the connection to the CIM port (default 5989). I will adapt the information. Here's a tcpdump I just did when the plugin is launched. As you can see, only the CIM port is accessed (no 443):

08:15:40.234718 IP monitoring.53568 > esxserver.5989: Flags [S], seq 2030707798, win 29200, options [mss 1460,sackOK,TS val 16733847 ecr 0,nop,wscale 7], length 0
08:15:40.237942 IP esxserver.5989 > monitoring.53568: Flags [S.], seq 2366951929, ack 2030707799, win 65535, options [mss 1460,nop,wscale 9,sackOK,TS val 854991276 ecr 16733847], length 0
08:15:40.237959 IP monitoring.53568 > esxserver.5989: Flags [.], ack 1, win 229, options [nop,nop,TS val 16733848 ecr 854991276], length 0
08:15:40.238176 IP monitoring.53568 > esxserver.5989: Flags [P.], seq 1:296, ack 1, win 229, options [nop,nop,TS val 16733848 ecr 854991276], length 295
08:15:40.248606 IP esxserver.5989 > monitoring.53568: Flags [P.], seq 1:1457, ack 296, win 130, options [nop,nop,TS val 854991277 ecr 16733848], length 1456
08:15:40.248625 IP monitoring.53568 > esxserver.5989: Flags [.], ack 1457, win 251, options [nop,nop,TS val 16733850 ecr 854991277], length 0
08:15:40.249236 IP monitoring.53568 > esxserver.5989: Flags [P.], seq 296:422, ack 1457, win 251, options [nop,nop,TS val 16733850 ecr 854991277], length 126
08:15:40.253590 IP esxserver.5989 > monitoring.53568: Flags [P.], seq 1457:1683, ack 422, win 130, options [nop,nop,TS val 854991277 ecr 16733850], length 226


Pap from Switzerland wrote on Dec 13th, 2016:
Hello ck,

thank you for you answer. I thought this would be the problem, because i get the following error:

(0, 'Socket error: [Errno 111] Connection refused')

And the faq told me to free port 443. So, i guess its some other problem. But thank you for the help and the awesome plugin.

Greetings

Patrick

ck from Switzerland wrote on Dec 13th, 2016:
Hello Pap. You actually only need to care about the CIM port. Port 443 will not be used.

Pap from Switzerland wrote on Dec 13th, 2016:
Hello,

maybe im stupid, but i need to forward port 443 and the CIM Port. But the problem is, there are multiple Hosts behind the NAT. Is there any possibilty to use another port and not 443? For instance externalIp:5001 which get mapped to internalIP:443? I only see an option to change the CIM port, not the HTTPS

Greetings

Patrick

ck from Switzerland wrote on Oct 27th, 2016:
Hi Adi, plugin runs fine on ESXi 6 serves, too. I'm successfully checking ESXi 6.0.0 build 4192238 servers. These are Cisco UCS servers.

Adi from Israel wrote on Oct 26th, 2016:
Does this plugin needs something extra to work on ESXi 6.0?
the esx image has been taken from HP site (https://my.vmware.com/web/vmware/details?downloadGroup=OEM-ESXI60U2-HPE&productId=491) but it stuck forever on:
Found pywbem version 0.9.0
Check classe OMC_SMASHFirmwareIdentity

same plugin works on 5.5

Lars from Germany wrote on Aug 26th, 2016:
Hi CK,
the update of pywbem did the trick. Thanks a lot. For those runnig SLES like me, to get the updated pywbem running i had to uninstall the pytho-pywbem from the distribution.

Thanks again for the ultrafast help

Lars

ck from Switzerland wrote on Aug 26th, 2016:
Hi Lars. Please check out http://www.claudiokuenzler.com/blog/542/installing-testing-pywbem-0.8-development-version where I documented how to install pywbem manually. Or make sure you have the "pip" program installed and then launch "pip install pywbem". That's by far the easiest and fastest way.

Lars from Germany wrote on Aug 26th, 2016:
Hi CK,

did restart of this services, did not change. My pywbem vesion is 0.7-6.22.1 .
Vmware did completely disable the SSLv3 with this update, i think this has to do with it, because on older version of ESXI 5.5 plugin still works fine. I have reenabled SSLv3 on service hostd (port 443) and sfcb (port 5989), but this does not help. Where can i get a newer version of pywbem, my nagios runs on SLES11 SP4, there are no newer packages in the repositories available.

ck from Switzerland wrote on Aug 26th, 2016:
Hi Lars. Try the following in this order:
- Restart CIM Server on the ESXi server
- Restart sfbc-watchdog
- Try a newer version of pywbem (which version are you currently using?)

Lars from Germany wrote on Aug 26th, 2016:
Hello All,

i have a question. After updating my ESXi Hhosts to 5.5 build 4179633 the plugin stopped working.
Output at command line:
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 665, in
instance_list = wbemclient.EnumerateInstances(classe)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_operations.py", line 421, in EnumerateInstances
**params)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_operations.py", line 183, in imethodcall
no_verification = self.no_verification)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 266, in wbem_request
h.endheaders()
File "/usr/lib64/python2.6/httplib.py", line 914, in endheaders
self._send_output()
File "/usr/lib64/python2.6/httplib.py", line 786, in _send_output
self.send(msg)
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 115, in send
self.connect()
File "/usr/lib64/python2.6/site-packages/pywbem/cim_http.py", line 163, in connect
if not check(self.sock.get_peer_cert(), self.host):
File "/usr/lib64/python2.6/site-packages/M2Crypto/SSL/Checker.py", line 66, in __call__
raise NoCertificate('peer did not return certificate')
M2Crypto.SSL.Checker.NoCertificate: peer did not return certificate

Does anyone know how to fix it?
Thanks in advance

regards

Lars

Roland Sommer wrote on Aug 23rd, 2016:
Yeah, that looks good now!

CRITICAL : HP Smart Array P812 Controller : Slot 3 : HPSA2 CRITICAL : HP Smart Array P812 Controller : Slot 3 : HPSA2

Many thanks!

ck from Switzerland wrote on Aug 23rd, 2016:
Sorry Roland, I meant "-V hp" (capital letter V).

Roland Sommer wrote on Aug 23rd, 2016:
Hi ck, thanks for your answer. The element is shown in the vSphere hardware Tab and the failure is detected. Adding -v hp to the command definition reports:
20160823 08:27:29 Check classe VMware_Battery
20160823 08:27:29 Element Name = Battery on HPSA1
20160823 08:27:29 Element Name = Battery on HPSA2

but final message is OK and exit code is 0.

ck from Switzerland wrote on Aug 22nd, 2016:
Hi Roland, as you can see on the screenshot (http://www.claudiokuenzler.com/nagios-plugins/check_esxi_hardware.php#Screenshots) problems with the RAID battery are also reported.
Make sure you have installed the newest vendor CIM bundle and that you use the correct hardware vendor in your command definition (-v hp in your case). Verify in vsphere hardware tab, if the element is shown.

Roland Sommer wrote on Aug 22nd, 2016:
check_esxi_hardware.py (20160531) does not report failure of HW RAID cache battery. It's reported via CLI:

/opt/hp/hpssacli/bin # ./hpssacli ctrl all show

Smart Array P812 in Slot 3 (sn: ***************)

CACHE STATUS PROBLEM DETECTED: The cache on this controller has a problem.
To prevent data loss, configuration changes to
this controller are not allowed.
Please replace the cache to be able to continue
to configure this controller.

and ist shown in the tab "Hardware Status" I the vSphere Client.

Is this a missing CIM element? Or does the script not handle this CIM element?

Claudio Kuenzler from Switzerland wrote on May 24th, 2016:
Dan, the mail address you left me was not working (Host or domain name not found. Name service error for name=cpdotomac.com type=A: Host not found). Please leave a working mail address in the form.

Dan from Maryland wrote on May 24th, 2016:
Just circling back with you, have not heard back!

ck from Switzerland wrote on Apr 28th, 2016:
Dan, let's continue the research off the comments. Please send me a mail (see contact form) and let's figure this out together. thanks.

Dan from Maryland wrote on Apr 28th, 2016:
I should have specified - I have already input that info and still get the same error. I did it again for good measure:

EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ESX1.redacted.com;ESXI Check Hardware;1461856963
Warning: Return code of 255 for check of service 'ESXI Check Hardware' on host 'ESX1.redacted.com' was out of bounds.
SERVICE NOTIFICATION: nagiosadmin;ESX1.redacted.com;ESXI Check Hardware;CRITICAL;notify-service-by-email;(Return code of 255 is out of bounds)

Claudio Kuenzler from Switzerland wrote on Apr 28th, 2016:
Can you run the plugin the exact same way as Nagios does it? Also run it as the nagios user, not as root.

Dan from Maryland wrote on Apr 28th, 2016:
I should have specified - I have already input that info and still get the same error. I did it again for good measure:

EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;ESX1.redacted.com;ESXI Check Hardware;1461856963
Warning: Return code of 255 for check of service 'ESXI Check Hardware' on host 'ESX1.redacted.com' was out of bounds.
SERVICE NOTIFICATION: nagiosadmin;ESX1.redacted.com;ESXI Check Hardware;CRITICAL;notify-service-by-email;(Return code of 255 is out of bounds)

ck from Switzerland wrote on Apr 28th, 2016:
Dan, you're missing some important variable declarations.

"command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ –P $ARG2$ dell;"
-> Here you miss the -V parameter before "dell"

"check_command check_esxi_hardware;"
-> Here you miss the actual arguments for user and password.

Dan from Maryland wrote on Apr 27th, 2016:
Hi Guys:
I am able to run the command from root (and nagios user) prompt via ssh:
./check_esxi_hardware.py -H 1.1.1.1 -U root -P Password

Works just fine, I get a response for all three of my esxi hosts. When I try to add the
Define command, and the define service, I get the same error as indicated below by Jetblack.

define command{
command_name check_esxi_hardware;
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ –P $ARG2$ dell;
}
define service{
use generic-service;
hostgroup_name hostgroup1;
service_description Hardware;
check_command check_esxi_hardware;
contact_groups +admins;
}

I have changed the ARGS from variables listed in resource. I have changed to nothing, and typing the user/password in the service defiinition as well. I also verified that the nagios user has permissions to the script.

The error I keep getting is:
Warning: Return code of 255 for check of service check esxi hardware

I've been beating my head about this for a couple days, figured I would reply and see what you may be able to do for me.

ck from Switzerland wrote on Apr 4th, 2016:
Hi Jerry. You can tell Nagios/Icinga to set the timeout for this command (in the command definition). The plugin itself runs as long as it has to (or it gets interrupted by something). For some servers I monitor I use a timeout of 120s/2min. Depending on the hardware and/or network connection this can differ.

Jerry wrote on Apr 4th, 2016:
Is there a way to extend the time out?

ck wrote on Feb 28th, 2016:
Hello Yury.
The warning comes from the element 'Memory':
20160228 13:19:21 Element Name = Memory
20160228 13:19:21 Element HealthState = 15
20160228 13:19:21 GLobal exit set to WARNING

It could either be a bug in the CIM implementation of this hardware element or it's not checked by the vSphere client. I suggest you run hardware diagnosis on this server (with HP SUM for example). If you don't find anything, you can ignore this element with "-i 'Memory'".

Yury from Belarus wrote on Feb 28th, 2016:
Hello ck!
Thank you for your response. I moved my nagios to the new server (Centos 6.7 x64). Here I have python 2.6.6 installed. I've installed pywbem 0.8. Now plugin is working, but returns warning:

./check_esxi_hardware.py -H 192.168.3.252 -U nagios -P MySecretPass
CRITICAL : Memory - Server: HP ProLiant DL360p Gen8 s/n: CZJ32102FW System BIOS: P71 2013-03-01

I've checked through iLO and VMWare Console, there are no errors.
I use VMware ESXi 5.1.0 build-1065491 (Update 1).
I've not found related information in FAQ.
Can you help me?

Here is full output:
20160228 13:19:16 Connection to https://192.168.3.252
20160228 13:19:16 Found pywbem version 0.8.0rc3
20160228 13:19:16 Check classe OMC_SMASHFirmwareIdentity
20160228 13:19:17 Element Name = System BIOS
20160228 13:19:17 VersionString = P71
20160228 13:19:17 Check classe CIM_Chassis
20160228 13:19:17 Element Name = Chassis
20160228 13:19:17 Manufacturer = HP
20160228 13:19:17 SerialNumber = CZJ32102FW
20160228 13:19:17 Model = ProLiant DL360p Gen8
20160228 13:19:17 Check classe CIM_Card
20160228 13:19:17 Check classe CIM_ComputerSystem
20160228 13:19:18 Element Name = System Board 7:1
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:2
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:3
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:4
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:5
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:6
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:7
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board 7:8
20160228 13:19:18 Element HealthState = 0
20160228 13:19:18 Element Name = System Board

ck from Switzerland wrote on Feb 22nd, 2016:
Hello Yury. Please try it with the 0.8.0 version of pywbem and see if it works. CentOS applied some patches on their own which seems to break stuff in pywbem. Check this out for more info: http://www.claudiokuenzler.com/blog/542/installing-testing-pywbem-0.8-development-version

Yury from Belarus wrote on Feb 19th, 2016:
Hello! Thank you very much for this grate plugin and great work!
My ESXi is 5.1 on ProLiant DL360p Gen8.

I trying to use your plugin on my nagios server (Centos 5.9 i386).
Python version is:
# python -V
Python 2.7

Extension pywbem (v. 0.7.0) have been succesfully installed.

#which python
/usr/local/bin/python
I've pointed to this location in header of check_esxi_hardware.py.

Then I changed line 593 to:
"wbemclient = pywbem.WBEMConnection(hosturl, (user,password))" to avoid authentication error.

Now I have following error:
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 618, in
c=wbemclient.EnumerateInstances('CIM_Chassis')
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 404, in EnumerateInstances
**params)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_operations.py", line 168, in imethodcall
verify_callback = self.verify_callback)
File "/usr/local/lib/python2.7/site-packages/pywbem/cim_http.py", line 184, in wbem_request
h.putheader('Content-length', len(data))
File "/usr/local/lib/python2.7/httplib.py", line 924, in putheader
str = '%s: %s' % (header, '\r\n\t'.join(values))
TypeError: sequence item 0: expected string, int found

How to fix the error?

Louis from South Africa wrote on Jan 11th, 2016:
It's at times like this when I feel truly embarrassed! After confirming with the guy who set up the boxes, he used a different password for root.

So it works communicating directly to ESXi 6.0.0 boxes with username root.

Thanks for the help!

ck from Switzerland wrote on Jan 7th, 2016:
When you connected the ESXi server into vsphere, you needed to enter the root password. Use root and the root password to see if it works. If it does, then you can try to make a less elevated user. I haven't had the chance yet to install ESXi/vsphere 6, so I can't tell whether check_esxi_hardware works on it or not.

Louis from South Africa wrote on Jan 7th, 2016:
We have two hosts, managed by VSphere on a Windows server. VSphere lists a few LOCAL users (one of which is Administrator) but I can't find where to change this list now. All other users now seem to be AD users.

As a side note: not matter what username or password I use, I get the same messages as below.

DOES the plugin work on ESXi 6.0.0? I saw a comment on the exchange.nagios.org page that said 6 was not supported.

ck from Switzerland wrote on Jan 7th, 2016:
Hi Louis, I have actually never tried to authenticate with an AD-user, so I don't know if it works or if it is supposed to work. Have you already tried it with a "local" user, meaning the user exists in /etc/passwd on the target ESXi server?

Louis from South Africa wrote on Jan 7th, 2016:
(I missed an important line in my previous post - please ignore it and use this one instead.)

Running on RedHat 7.2, fully patched, with
python-2.7.5-34.el7.x86_64
pywbem-0.7.0-25.20130827svn625.el7.noarch

When I run the command against VMWare 6.0.0 (with VSphere integrated into Active Directory) it fails even though the password is correct:

# ./check_esxi_hardware.py -H 172.25.2.13 -U ADdomain\\ADuser -P "ADpassword" -V hp -v

20160107 16:54:35 Connection to https://172.25.2.13
20160107 16:54:35 Found pywbem version 0.7.0
20160107 16:54:35 Connection error, disable SSL certification verification (probably patched pywbem)
20160107 16:54:35 Check classe OMC_SMASHFirmwareIdentity
20160107 16:54:37 Global exit set to UNKNOWN
UNKNOWN: Authentication Error

Any thoughts?

ck from St. Gallen, Switzerland wrote on Nov 17th, 2015:
Hi Craig. Try it with newer pywbem 0.8.0.

Craig Hart from Australia wrote on Nov 17th, 2015:
windows os
python 2.7.10
pywbem 0.7.0

run script returns the error:

Traceback (most recent call last):
File "C:\MyTechAgent\ESXi_HardwareStatus\check_esxi_hardware.py", line 593, in

wbemclient = pywbem.WBEMConnection(hosturl, (user,password), no_verification
=True)
TypeError: __init__() got an unexpected keyword argument 'no_verification'


If I edit line 594 and change to wbemclient = pywbem.WBEMConnection(hosturl, (user,password)) it works.

So, there seems to be some issue with the if-then-else logic around the 0.7.0 connection tests ? it's using the version "with" no_verification=True when it shouldn't. ??




ck from Wil, Switzerland wrote on Aug 31st, 2015:
Hi Frank, actually you can trust both information. vSphere client reads out the CIM HealthState while the plugin currently reads the CIM OperationalState for every server except HP. So this is where the difference comes from. So your RAID Controller (Controller 5003005700C3BEA0) actually has a non-OK operational state which might indicate a hardware failure or (what I've seen with Dell servers) a bug in the firmware. You can switch to -V hp if you want to have the same "view" as in vSphere client.

Frank Wein wrote on Aug 31st, 2015:
What info should I trust when there is a contradiction between the hardware status in vSphere client and this script? Recently I got this error on a server (a few days after I started using this script):
CRITICAL : Controller 5003005700C3BEA0 (RAID Ctrl SAS 6G 5/6 512MB (D2616)) CRITICAL : Controller 5003005700C3BEA0 (RAID Ctrl SAS 6G 5/6 512MB (D2616)) - Server: FUJITSU PRIMERGY RX300 S6 s/n: ... System BIOS: 6.00 Rev. 1.07.2619.N1 2010-08-16

Jim Caldwell wrote on Jun 5th, 2015:
Will this script ever be updated to work with Python 3? It gives errors starting on line 440 related to the print statement. I'm using FreeBSD 10 and I don't think 2.7 will ever get the pywbem update to fix the SSL problem, which makes this plugin useless to me now. I miss it greatly.

Duncan Carter from England wrote on Mar 31st, 2015:
Hi,
Having the same 'CRITICAL : Memory - Server: HP ProLiant DL120 G7' error that others have seen, I've gone through the BIOS with a fine tooth comb and have reset all the logs I can find, any suggestions?
Many thanks in advance,

ck from Switzerland wrote on Feb 3rd, 2015:
Concerning the "EOF occurred in violation of protocol" error: I think it has something to do that the connection is cut before the plugin can run through everything. For debugging, you should run the plugin regularly with a cronjob, saving the verbose output in a file. When you see that error appearing in Nagios, you analyze the log file if the same issue happened when the plugin was launched by cron. This may give you a hint if there was a timeout or if there was a problem with the connection.

Julian from España wrote on Feb 3rd, 2015:
Hi Claudio,

I'm afraid that I'm suffering the same problems that James and Jan:

CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")


Mi Command is this:

check_esxi_hardware.py -H X.X.X.X -U xxx -P pppp -V hp

When I executed it in verbose mode I get this:
20150203 10:19:48 Connection to https://X.X.X.X
20150203 10:19:48 Check classe OMC_SMASHFirmwareIdentity
20150203 10:19:48 Element Name = System BIOS
20150203 10:19:48 VersionString = P70
20150203 10:19:48 Check classe CIM_Chassis
20150203 10:19:49 Element Name = Chassis
20150203 10:19:49 Manufacturer = HP
20150203 10:19:49 SerialNumber = CZ2334234
20150203 10:19:49 Model = ProLiant DL360p Gen8
20150203 10:19:49 Check classe CIM_Card
20150203 10:19:49 Check classe CIM_ComputerSystem
CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")

The issue is that I have a similar machine in a cluster and the other one never fails. Would it be something about VMware configuration?

Thank you in advance!!

Jan wrote on Jan 28th, 2015:
Hello,
I have the same issue as James:
CRITICAL: (0, Socket error: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol)
on HP DL380 G8 working with ESX 5.5

Monitoring worked for month, but suddenly it started throwing this failure.
Is there any solution ?

Thanks

sacke from Sverige wrote on Nov 27th, 2014:
Thanks for great help regarding HP array monitoring.

Seemed that if i Specifiec -V hp, then it works correctly, but if i dont specify vendor, then it would only detect broken raid volumes on HP, not broken disks.

// Stefan

ck from Switzerland wrote on Nov 27th, 2014:
Hi sacke. The important part is the element status code the CIM element returns. If the number is another than 5 (for HP/HealthState) then you can use this to be alerted, too. See the following table (its part of the plugin):

0  : ExitOK,    # Unknown5  : ExitOK,    # OK10 : ExitWarning,  # Degraded15 : ExitWarning,  # Minor20 : ExitCritical,  # Major25 : ExitCritical,  # Critical30 : ExitCritical,  # Non-recoverable Error


sacke from Sverige wrote on Nov 27th, 2014:
Hello.
Running the script against HP servers.
Works great, but only alerts on failed disks in logical volumes.
Have a raid setup with spares, and when one disk fails, the script alerts.
But since we have spare disks, the script stops alerting when the raid has rebuild.
Wont alert on broken disk not included in raid volume.
Doesnt either alert on "predictive failures" on disks.

If i run the script -v then i see the disks as unconfigured disk: Predictive Failure
Element Name = Disk 4 on HPSA2 : Port 1E Box 1 Bay 4 : 68GB : Unconfigured Disk : Predictive Failure

Can i adjust the script to alert on this aswell ?

ck from Switzerland wrote on Oct 15th, 2014:
Hi James. I got some e-mails concerning the same error you describe, but unfortunately I did not get the final solution as a follow-up from these users. I highly suspect a network connectivity issue as source of the error. To be proved/proved wrong :)

james from Belgium wrote on Oct 15th, 2014:
Hi Claudio,

When I run check_esxi_hardware.py , I get sometimes the message "CRITICAL: (0, "Socket error: (8, 'EOF occurred in violation of protocol')")"
and other times it run well ?

ck from Switzerland wrote on Oct 5th, 2014:
Hello Jetblack, please post you commands and service definition. Maybe there's a problem in there or some incompatibility. If you prefer, you can also send the definitions directly to my mail.

Jetblack from London wrote on Oct 3rd, 2014:
Hi Claudio,

Thanks for the answer. I have disabled the verbose mode in definitions and tried all possible options. I manage to run the command properly under my root and my nagios user. The result of the echo $? command is a 0 which should get properly accepted by nagios. Some forum answers (about the same error but with NRPE) seem to point at an access rights problem on the memory sector/file/location where the output (0/1/2/3 in our case for Nagios) is stored/sent.
I can't find anything relevant as to where that should be, but it's probably because I have only very basic knowledge of standard linux/python command outputs.
Do you have an idea ?

Thanks.

Claudio Kuenzler from Switzerland wrote on Oct 1st, 2014:
Hi Jetblack, Maybe you have configured the -v (verbose) parameter into your command or service definition?
Thatd explain it.

Jetblack from London wrote on Oct 1st, 2014:
Hi,

This script is absolutely amazing. I am trying to make it work with a Nagios 4.0.8 test environment.
I am running it fine in command line and get the expected return.
However in Nagios I get a return code of 255, probably a return code that is too long.
I have little to no knowledge in programming in Python and am wondering how I could trim this return code to fit my very basic Nagios.

Thanks.

AV from Paris/France wrote on Sep 9th, 2014:
Thank you for your answer but I had already tried that without any success


ck from Switzerland wrote on Sep 9th, 2014:
Hi AV, it's written above in the FAQ: You might have to restart the sfcbd-watchdog.

AV from Paris/France wrote on Sep 9th, 2014:
Hi,
I am trying to get this plugin working.
I get a timeout error when I try to run it:
[root@nagios libexec]# ./check_esxi_hardware.py -H xx.xx.xx.xx -U root -P xxxxxx -V hp -v
20140909 15:52:59 Connection to https://xx.xx.xx.xx
20140909 15:52:59 Check classe OMC_SMASHFirmwareIdentity
CRITICAL: (0, "Socket error: (110, 'Connection timed out')")

It does not get any further than this ..
and I got the same result when I put a wrong password. as if it didn't even get to the point where it checks the user/password

Thank you for any help

AV

ck from Wil, Switzerland wrote on Sep 2nd, 2014:
Hello Oliver,
Thanks so much for that hint! I was looking for a way to disable the certificate validation, but my research led into nirvana. I will take your code into the next release after testing. Thanks again!

oliver from Germany wrote on Sep 2nd, 2014:
Hello,
i had issues with "Unknown CIM Error: (0, 'SSL error: certificate verify failed') on CentOS. After a little bit of investigating i changed line 561 to:

# connection to host
verboseoutput("Connection to "+hosturl)
wbemclient = pywbem.WBEMConnection(hosturl, (user,password), NS, no_verification=True)

Without the verification checks, everything works smooth.

Regards Oliver

Chris B from Germany wrote on Aug 26th, 2014:
Hi Martin,

sry was on Vacation.
Check this screen: http://www.evernote.com/shard/s221/sh/479a7f2d-e9b0-4d79-9ade-5e6467c98720/eb7f7522f47b68f769fab1ed728e540a

Martin.N from Germany wrote on Jul 25th, 2014:
I'm also getting "UNKNOWN: Authentication Error" while checking ESXi 5.5.

@Chris B
What checkbox do you mean?

Chris B from Germany wrote on Jul 18th, 2014:
Hi,

we are using this plugin for years with a esxi 4.1 Supermicro Server. Works good, thx.

Yesterday, I got a new machine and installed it with esxi 5.5. -> UNKNOWN: Authentication Error
Does the script work on 5.5 ?

chris wrote on Jul 17th, 2014:
Hi,

When can whe use the plugin with the pywbem-0.7.0-25

ck from Switzerland wrote on Apr 8th, 2014:
Hi Jon, can you run the verbose option to see where the script stops and send me the full output (by mail would be best). Thanks.

Jon Tan from Los Angeles wrote on Apr 8th, 2014:
Thanks for creating this check. I currently get the error below. Any help would be appreciated. Thank you.

/usr/lib/python2.6/site-packages/pywbem/cim_types.py:39: DeprecationWarning: object.__init__() takes no parameters
int.__init__(self, arg, base)
Traceback (most recent call last):
File "./check_esxi_hardware.py", line 662, in
sensorType = instance[u'sensorType']
File "/usr/lib/python2.6/site-packages/pywbem/cim_obj.py", line 620, in __getitem__
def __getitem__(self, key): return self.properties[key].value
KeyError: u'sensorType'

Shannon Young from Canada wrote on Mar 24th, 2014:
I thought it may have been pnp4nagios causing it so I've disabled all performance collection against the service.

Here's my service definition:
define service {
host_name esx-hostname
service_description Hardware
check_command check_esxi_hardware
max_check_attempts 3
check_interval 5
retry_interval 1
notification_interval 30
check_period 24x7
}

I've tried to make my check as simple as possible with the unfortunate same result :(

(No output on stdout) stderr: ENV: 'NAGIOS_HOSTNAME'='esx-hostname'

When looking at the status information against the service I see there's a bunch of data there (not sure if this will help).

I can run it as the nagios user without any troubles:

[nagios@nagios01 ~]$ /usr/local/nagios/libexec/check_esxi_hardware.py -H 10.9.22.99 -U root -P 'mypa$$word' -V dell
OK - Server: Dell Inc. PowerEdge 2950 s/n: XXXXXXX System BIOS: 2.6.1 2009-04-20

ck from Switzerland wrote on Mar 22nd, 2014:
Hi Shannon. How does your service definition look? Also make sure, that you can run the plugin with the exact same parameters as your nagios user (not root) on your nagios server.

Shannon Young from Canada wrote on Mar 22nd, 2014:
I recently upgraded my release to 4.0.4 and came across your plugin. I installed without any issues and can run the script no worries. However once I configure Nagios to use it I now get a Critical:
(No output on stdout) stderr: ENV: 'NAGIOS_HOSTNAME'='esxserver' with a whole bunch of variables. Hoping it's not an issue with Nagios 4.0.4 as it doesn't seem to work :( thoughts?

This is what I have defined in commands


define command{
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U root -P 'mypa$$word' -V dell
}

Any help greatly appreciated and happy to provide further information if needed.

kornflex wrote on Mar 5th, 2014:
I was wrong but I was in a good directory.

I found my error :
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

has to be :
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ $ARG3$
}

$ARG3$ could be : -i "ignore word"
But I can't define it empty like this : -i or -i "" or -i ''

I have to put all in argument :/


ck from Switzerland wrote on Mar 5th, 2014:
You have just answered your own question now:

"where $USER1$=/usr/local/nagios/libexec ( resource.cfg )" vs. "The file is present : /usr/lib/nagios/plugins/check_esxi_hardware.py with permissions"


kornflex wrote on Mar 5th, 2014:
It works with su - nagios

commands.cfg :
define command {
command_name check_esxi_hardware
command_line $USER1$/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

where $USER1$=/usr/local/nagios/libexec ( resource.cfg )

services.cfg

define service {
service_description check_esxi_hardware
check_command check_esxi_hardware!root!PASSWD!""
host_name pcbu-vm2
check_period 24x7
notification_period 24x7
contact_groups admins
event_handler_enabled 0
notification_interval 1440
notification_options w,u,c,r
max_check_attempts 5
check_interval 10
retry_interval 2
use notification_default_24h
}

ck from Switzerland wrote on Mar 5th, 2014:
Make sure you use "su - nagios", not just "su nagios" to change the environment, too. This is where it could be failing. If that's not it, can you show the service definition?

kornflex wrote on Mar 5th, 2014:
ps aux | grep nagios :
nagios 28610 0.0 0.0 6188 1524 ? SNs 15:16 0:00 /usr/sbin/nagios3 -d /etc/nagios3/nagios.cfg

So with nagios user account ( su nagios ) :
./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD

All is OK

ck from Switzerland wrote on Mar 5th, 2014:
Did you run "./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD" as root or as the user under which your nagios installation is running? Make sure you become the correct user first. Assuming your nagios installation runs under the user "nagios", do "su - nagios" and then execute the plugin again.

kornflex wrote on Mar 5th, 2014:
Hi,

I use $USER1$ in my commands.cfg. I 've just replace the string with the correct value :)

I can launch in command line:
./check_esxi_hardware.py -H pcbu-vm2 -U root -P PASSWD

The result :
OK - Server: Dell Inc. PowerEdge T110 II s/n: C5SJ95J System BIOS: 1.2.4 2011-09-19


ck from Switzerland wrote on Mar 5th, 2014:
Hi kornflex. Make sure you can execute the plugin on the command line.
In the commands definition usually you use $USER1$/check_esxi_hardware.py as path.
It is also possible that the output of the plugin is too big to be handled by nagios. In this case manually launch the plugin to verify the output.

kornflex wrote on Mar 5th, 2014:
Hi,

No problem in command line, but Return code of 127 is out of bounds - plugin may be missing in ngios webGUI :/

commands.cfg :
define command {
command_name check_esxi_hardware
command_line /usr/lib/nagios/plugins/check_esxi_hardware.py -H $HOSTADDRESS$ -U $ARG1$ -P $ARG2$ -i $ARG3$
}

The file is present : /usr/lib/nagios/plugins/check_esxi_hardware.py with permissions :
rwxr-xr-x 1 root root 31654 24 avril 2013 check_esxi_hardware.py

I can launch the command line as nagios or www-data or root.

Can you help me ?

Thanks

dmitrylnx from At work where else i can be :) wrote on Feb 8th, 2014:
There is a way to make the script to connect remote hosts with port forwarding.
1. Make a copy of the original script and rename it to check_esxi_remote.py or whatever.
2. Edit line 509 of the new script
from hosturl = 'https://' + hostname
to hosturl = 'https://' + hostname + ':7443'
where 7443 is the NAT port from witch you redirect to your ESXi host. In my case:
WAN:7443->LAN:esxi:443
Now if you append https:// before remote host IP in the new script, it will automatically add :7443 to the destination address. Works great for me one script for local, one for remote.
example: ./check_esxi_remote.py -H https://remote-ip -U root -P somepasssword

Thank you very much for this script , really great tool.

Claudio from Switzerland wrote on Dec 19th, 2013:
Try with giving the port 444 in the hostname like this:

/check_esxi_hardware.py -H xxx.xxx.xxx.xxx:444 -U root -P mypass -V hp


This is untested however so I'm not sure if it will work or not.

Gijsbert from Netherlands wrote on Dec 19th, 2013:
Hi all, I did a portforwarding in the router from 444 to 443 and can connect with the vsphere client to ip:444 but when i do a check_esxi_hardware.py ip:444 i get errors. When i try a working server on 443 and do check_esxi_hardware.py ip:443 is gives the same error's so i think te script doesn't understand the :44x addition or I don't understand how to do it.

Claudio from Switzerland wrote on Dec 10th, 2013:
Hi Jake. You can't change the CIM port 5989 but you can change the management port (443) to a different one. Please see http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&externalId=1021199 for a howto.

Jake from Kansas City wrote on Dec 10th, 2013:
Is there any way to configure check_esxi_hardware.py to use non-default ports? I'm monitoring some remote ESXi servers behind a firewall with only one static IP and have to use port-forwarding to permit access. I can't set up both ESXi servers with the same port on the firewall. Thanks!

Claudio from Switzerland wrote on Nov 13th, 2013:
Hello Shawn. Make sure the CIMProviders are activated in your ESXi (check Advanced Software Settings in vSphere client).

Shawn from Germany wrote on Nov 12th, 2013:
Hello,

I am trying to run this from the command line from Nagios. I am receiving (0, 'Socket error: [Errno 111] Connection refused'). The monitoring server and the ESX server are on the same LAN, so no firewall is blocking the connection. I checked and CIM is allowing both port 443 and 5989. Any ideas?

./check_esxi_hardware.py -H esx_ip -U root -P password

Claudio from Switzerland wrote on Nov 7th, 2013:
Hi Mike. Run the plugin as nagios (or whatever user you use for your monitoring application). You have to verify that all permissions and paths are correct.

Mike from USA, Erie, Pennsylvania wrote on Nov 7th, 2013:
I can run this fine from command line, but when nagios runs the service check, I get the following error:

(No output on stdout) stderr: Traceback (most recent call last):
File "/usr/local/nagios/libexec/check_esxi_hardware.py", line 543, in
getopts()
File "/usr/local/nagios/libexec/check_esxi_hardware.py", line 529, in getopts
filename = open(filextract, 'r')
IOError: [Errno 2] No such file or directory: ':/usr/local/nagios/sbin/.esxipass'

Any help is greatly appreciated.

Claudio from Switzerland wrote on Sep 5th, 2013:
Hello Matt. I do not know how Centreon handles performance data, but the plugin returns all perf data in one return. Example:

OK - Server: Supermicro X8SIE s/n: 0123456789 System BIOS: 1.0c 2010-05-27|P2Vol_0_System_Board_34_CPU_Vcore=0.87;1.34;1.4 P2Vol_1_System_Board_35_+3.3VCC=3.29;3.58;3.64 P2Vol_2_System_Board_35_+3.3VSB=3.29;3.58;3.64 P2Vol_3_System_Board_35_AVCC=3.29;3.58;3.64 P2Vol_4_System_Board_35_VBAT=3.15;3.58;3.64 P2Vol_5_System_Board_37_+12_V=12.03;13.09;13.19 P2Vol_6_System_Board_38_CPU_DIMM=1.55;1.76;1.77 P2Vol_7_System_Board_39_+5_V=5.05;5.34;5.6 P2Vol_8_System_Board_40_-12_V=-12.29;-11.71;-11.51 P4Tem_0_System_Board_28_System_Temp=39;75;77 P5Fan_0_System_Board_31_FAN_2=8725;29260;29815 P5Fan_1_System_Board_31_FAN_3=9280;29260;29815 P5Fan_2_System_Board_31_FAN_4=8725;29260;29815 P5Fan_3_System_Board_32_FAN_1=9280;29260;29815

So it is up to your graphing tool to handle the performance data all in one. I personally use nagiosgraph for such graphing but pnp4nagios is also capable of creating multiple graphs from one perfdata output.

Matt from China wrote on Sep 5th, 2013:
Hi Claudio,

Thank you a lot for this script. We can monitor all of our ESX servers in a reliable way.

But I've one question.

Our current ESX version is 5.1 on Dell servers (R610, R620) and we're getting more and more servers everyday.

We also use Centreon with nagios to poll servers.

In order to graph clearly each type of element, I've defined different probes (FAN, voltage, current, etc...) but each probe executes check_esxi_hardware.py script every five minutes then the poller server gets overloaded (long delay to get feedback from the script).

Is there any way to get all information in one time for each server and grep useful information through different probes ?

Thank you in advance.

Claudio from Switzerland wrote on Aug 8th, 2013:
Martin, you need to define at least a host and a service using this defined host. Check out the official Nagios documentation.

Martin from Netherland wrote on Aug 8th, 2013:
Thank you for this plug-in,
Works for me, I can do the check etc.
but I need a little help :\'(

I\'m a Newbie at nagios, so I put the command definition(s) in command.cfg, that was easy.
I created a new Cfg file and put it under object/VMware.cfg.
I defined the VMware.cfg in Nagios.cfg as a config file.

And I put the \"service check\" in the VMware.cfg file.
But when I reload Nagios, it gives me an Config error!

Do I need to define a \"Host\" in my Vmware.cfg file?
Hope you can help, much appreciated

Claudio from Switzerland wrote on Aug 5th, 2013:
The plugin "requires" that you run ESX/ESXi and that the monitoring server has pyhton with the pywbem extension installed (see requirements). It doesn't matter if you use the licensed or free ESXi version. The hardware agents are optional, however I suggest you install the so-called Offline Bundles which you can find on the HP and Dell websites.

Sarita Gupta from India wrote on Aug 5th, 2013:
What are the pre-requisites for this plugin? Does it monitor both the free as well as the licensed ESX? Does it require installation of the hardware agents (like dell OMSA, HP SIM etc)?

chris from maastricht wrote on Jul 11th, 2013:
Hi Claudio,

i want to thank you so much. This was the solution!!

chris from maastricht wrote on Jul 11th, 2013:
Hi claudio,

Thanks i will try that. When i am home i will unplug a hdd.

Thanx.

Claudio from Switzerland wrote on Jul 11th, 2013:
For HP servers, argument '-V hp' is mandatory. Try it with that again.

chris from maastricht wrote on Jul 11th, 2013:
So what happens is that the script shows a hdd (raid) is interim. But in nagios the plugin still shows ok. (in nagios when enabled verboe mode) it shows the interim hdd but the plugins still says all ok??

chris from maastricht wrote on Jul 11th, 2013:
Hi claudio,

I use a Hp proliant dl360 g5. I have installed the offline bundle and see all hardware in vsphere manager. That is working al fine. this is my verbose output of the script.(now i did not unplug the hdd. im at work now).

20130711 10:57:19 Connection to https://192.168.2.2
20130711 10:57:19 Check classe OMC_SMASHFirmwareIdentity
20130711 10:57:19 Element Name = System BIOS
20130711 10:57:19 VersionString = P58
20130711 10:57:19 Check classe CIM_Chassis
20130711 10:57:19 Element Name = Chassis
20130711 10:57:19 Manufacturer = HP
20130711 10:57:19 SerialNumber = CZJ64700PD
20130711 10:57:19 Model = ProLiant DL360 G5
20130711 10:57:19 Element Op Status = 0
20130711 10:57:19 Check classe CIM_Card
20130711 10:57:20 Check classe CIM_ComputerSystem
20130711 10:57:20 Element Name = System Board 7:1
20130711 10:57:20 Element Op Status = 0
20130711 10:57:20 Element Name = System Board 7:2
20130711 10:57:20 Element Op Status = 0
20130711 10:57:20 Element Name = System Internal Expansion Board 16:1
20130711 10:57:20 Element Op Status = 0
20130711 10:57:20 Element Name = esxi
20130711 10:57:20 Element Name = Hardware Management Controller (Node 0)
20130711 10:57:20 Element Op Status = 0
20130711 10:57:20 Element Name = HP Smart Array P400i Controller : Embedded : HPAS1
20130711 10:57:20 Check classe CIM_NumericSensor
20130711 10:57:20 Element Name = System Board 2 Power Meter
20130711 10:57:20 sensorType = 4 - Current
20130711 10:57:20 BaseUnits = 7
20130711 10:57:20 Scaled by = 0.010000
20130711 10:57:20 Current Reading = 284.000000
20130711 10:57:20 Element Op Status = 2
20130711 10:57:20 Element Name = Processor 6 Temp 7
20130711 10:57:20 sensorType = 2 - Temperature
20130711 10:57:20 BaseUnits = 2
20130711 10:57:20 Scaled by = 0.010000
20130711 10:57:20 Current Reading = 35.000000
20130711 10:57:20 Upper Threshold Critical = 95.0

Claudio from Switzerland wrote on Jul 11th, 2013:
Hi Chris, What server model do you use? Did you install any available Offline Bundles to enhance the CIM information? Can you show the relevant part of the verbose output here?

chris from maastricht wrote on Jul 11th, 2013:
Hello,

The plugin is great, however in nagios if i pull out a disk the plugin wil not go in a warning state?? in verbose mode it shows the error (interim recovery). Do more people have this problem and a fix????

Mark Hughes from UK wrote on Mar 6th, 2013:
Q: Some HP Servers show Smart Array Controller in a warning state after upgrading to ESXi 5.1 with the latest ESXi offline bundle.
A: The latest ESXi bundle has a bug, download version 1.3.5 from here http://bit.ly/13knFyc, copy to “/var/log/vmware/” on the ESXi server and install through the command line with “esxcli software vib install -d hp-esxi5.0uX-bundle-1.3.5-3.zip”


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7367 Days
until Death of Computers
Why?