Header RSS Feed
 
» Nagios Plugins

Nagios Plugin: Dell Equallogic

Last Update: July 26, 2014

This is a plugin to monitor a Dell Equallogic with Nagios. Its written in bash so it should run on almost all Linux/Unix based systems. It's using SNMP (v2) to query the informations from the Equallogic device. To be able to use the script, please also check the requirements.

The plugin has been successfully tested on the following Dell Equallogic devices:
PS100E (70-0011) with Firmware 3.3.x, 4.3.x
PS4000/E/XV (70-0120) with Firmwares 4.3.x, 5.0.x, 5.2.x
PS4100/X (70-0476) with Firmware 5.1.x, 5.2.x, 7.0.x
PS5000XV (70-0111) with Firmwares 4.1.x, 4.3.x, 5.0.x
PS5000E (70-0115) with Firmwares 3.3.x, 4.0.x, 4.1.x, 4.2.x, 4.3.x, 5.0.x
PS6000E (70-0202) with Firmwares 4.1.x, 4.2.x, 4.3.x, 5.0.x, 5.2.x, 6.0.x
PS6000XVx (70-0202) with Firmware 4.3.x, 5.0.x, 5.1.x, 5.2.x, 6.0.x
PS6010 (70-0300) with Firmware 4.3.x, 5.0.x, 5.1.x, 5.2.x
PS6100X (70-0400) with Firmware 5.2.1
PS6110/E/XV (70-0477) with Firmware 6.0.x
Please let me know if you have another Equallogic model and/or another firmware running.

Download check_equallogic check_equallogic
Download plugin and save it in your Nagios plugin folder (e.g. /usr/local/nagios/libexec)
10597 downloads so far...

Version History
20091109 Started Script programming checks: health, disk, raid, uptime, ps, info
20091112 Added ethif, conn
20091118 Added diskusage
20091119 Bugfix on Outputs (removed Pipes)
20091121 Public Release
20091204 Bugfix (removed IP addresses)
20091206 Bugfix (removed SNMP community names)
20091222 Fixed raid, ps, health and diskusage checks when multiple member devices exists. By Mathias Sundman.
20100112 Successful tests on PS5000XV - thanks to Scott Sawin
20100209 Compatibility matrix now on website (see Tested on above)
20100416 Beta Testing for rewritten ethif check (allows more than 3 interfaces)
20100420 Corrected ethif output, finished new ethif check
20100526 Using proper order of snmpwalk command, thanks Roland Ripoll
20100531 Added performance data for diskusage and connections - thanks to Benoit Poulet.
20100622 Corrected perfdata output (+added thresholds), thanks to Christian Lauf
20100809 Fixed conn type -> Now the total number of connections of all members in a group is used
20101026 Using /bin/bash instead of /bin/sh again (Ubuntu users had problems due to /bin/sh symlink to /bin/dash)
20101026 Bugfix in snmpwalk usage (using vqe instead of vq), thanks to Fabio Panigatti
20101102 Added fan
20101202 Added volumes (checks utilization of all volumes)
20110315 Bugfix in Fan Warning check and changed output in diskusage check
20110323 Mysteriously disappeared 'temp' type added again. Thanks to Peter Wirdemo
20110328 Beta Testing for etherrors check by Martin Conzelmann
20110404 Added thresholds to etherrors check by Martin Conzelmann
20110404 Bugfix in volumes check
20110407 New temp check by Martin Conzelmann: Rewritten and more information in output
20110725 New disk check by Amir Shakoor (~6x faster). Some bugfixes then added.
20110804 New poolusage check by Chris Funderburg and Markus Becker (perfdata)
20110808 New vol check - checks single volume for utilization
20111013 Bugfix in vol check for similar vol names by Matt White
20111031 Bugfix in ethif check for int response by Francois Borlet-Hote
20120104 Bugfix in temp check if only one controller available
20120104 Bugfix in info check if only one controller available
20120123 Bugfix in volumes check
20120125 Added perfdata in volumes check, volume names now w/o quotes
20120319 Added poolconn check by Erwin Bleeker
20120330 Rewrite of poolusage (original poolusage is now called memberusage) by Erwin Bleeker
20120405 Bugfix in poolusage to show result without thresholds
20120430 Added snapshots type by Roland Penner
20120503 Rewrite of info check (Fix for multiple members, added firmware check)
20120815 Added percentage of raid rebuild when raid reconstructing
20120821 Minor bugfix in vol/volumes check (added space in perfdata)
20120911 Added percentage of raid rebuild when raid expanding
20120913 Bugfix in percentage output in raid check
20121204 Added percentage of raid rebuild when raid verifying
20121204 Changed raid percentage output when multiple members around
20121228 ps type now also checks for failed power supply fans
20130728 Added copy to spare raid status by Peter Lieven
20131024 Bugfix in temp check (Backplane_sensor_0 was not shown)
20131025 Optical cleanup
20131122 Bugfix in vol check when volumes spread across members
20131219 Bugfix in poolusage check when a pool was not used (0 size)
20140626 Bugfix in etherrors check
20140711 Added snmp connection check function

Requirements
- The following shell commands must exist and be executable by your Nagios user: snmpwalk, awk, grep, wc, cut
- SNMP must be enabled on the Dell Equallogic device. If it is not already, enable it on the member.

------------------------

Definition of the parameters:

-H Hostname or IP address of Equallogic to check
-C SNMP Communityname (must be at least readable)
-t Type of check you want to do (see the definition of types further down)
-v Name of single volume to check
[-w] Warning threshold (optional and only in combination with certain types)
[-c] Critical threshold (optional and only in combination with certain types)
--help Help text for correct usage of this script

------------------------

Definition of the types:

conn -> Checks number of current ISCSI connections (thresholds possible)
disk -> Checks Status of all disks
diskusage -> Checks how much raid space is already used (thresholds possible)
etherrors -> Checks ethernet interfaces for ethernet packet errors
ethif -> Checks status of ethernet interfaces (thresholds possible)
fan -> Checks status of fans
health -> Checks overall health of Equallogic device
info -> Shows general information of Equallogic device and checks for same firmware version
memberusage -> Shows disk utilisation of all members of the same group (thresholds possible)
poolconn -> Check highest number of ISCSI connections per pool (thresholds possible)
poolusage -> Checks utilization of pools (thresholds possible)
ps -> Checks status of power supply(ies)
raid -> Checks RAID status
snapshots -> Checks Snapshot Reserve status (warning level is taken from the equallogic volume config, critical level can be set with -c )
temp -> Checks temperature sensors
uptime -> Shows uptime of Equallogic device
vol -> Checks a single volume, must be used with -v option (thresholds possible)
volumes -> Checks utilization of all created ISCSI volumes (thresholds possible)

------------------------

Command definition in your commands.cfg:

# 'check_equallogic' command definition
define command{
command_name check_equallogic
command_line $USER1$/check_equallogic -H $HOSTADDRESS$ -C $ARG1$ -t $ARG2$ $ARG3$
}

Note: I defined the -C (SNMP Communityname) as a variable. If you want and if you always use the same community name (e.g. public) you can of course set this static to public.

------------------------

Service checks:

# Check Equallogic Informations
define service{
use generic-service
host_name equallogic01
service_description General Information
check_command check_equallogic!public!info
}

The info check type will show general information of all Equallogic members in the same group. It will also check that the used firmware is the same on all members.

-------

# Check Equallogic Health
define service{
use generic-service
host_name equallogic01
service_description General Health
check_command check_equallogic!public!health
}

-------

# Check Equallogic Uptime
define service{
use generic-service
host_name equallogic01
service_description Uptime
check_command check_equallogic!public!uptime
}

-------

# Check Equallogic Disk Status
define service{
use generic-service
host_name equallogic01
service_description Disk Status
check_command check_equallogic!public!disk
}

-------

# Check Equallogic Disk (Raid) Usage
define service{
use generic-service
host_name equallogic01
service_description Disk Usage
check_command check_equallogic!public!diskusage!-w 85 -c 95
}

The thresholds for the type 'diskusage' are normal integers and represent the percentage of disk (raid) usage. In this example, a WARNING notification will be send when 85% or more of the raidspace is used. A CRITICAL will be send when the usage is 95% or more.

-------

# Check Equallogic Raid Status
define service{
use generic-service
host_name equallogic01
service_description Raid Status
check_command check_equallogic!public!raid
}

-------

# Check Equallogic Ethernet Interfaces
define service{
use generic-service
host_name equallogic01
service_description Ethernet Interfaces
check_command check_equallogic!public!ethif!-w 1 -c 2
}

The thresholds for the type 'ethif' are normal integers and represent the numbers of ethernet interfaces. In this example, a WARNING notification will be send when 1 interface is down. A CRITICAL notification will be send when 2 interfaces are down. If no thresholds are given, the plugin only outputs infornation and status OK (no matter how many interfaces are down).

-------

# Check Equallogic ISCSI Connections
define service{
use generic-service
host_name equallogic01
service_description ISCSI Connections
check_command check_equallogic!public!conn!-w 20 -c 50
}

The thresholds for the type 'conn' are normal integers. In this example, a WARNING notification will be send when there are 20 or more ISCSI connections. A CRITICAL will be send when 50 or more ISCSI connections are open.

-------

# Check Equallogic Power Supplies
define service{
use generic-service
host_name equallogic01
service_description Power Supply
check_command check_equallogic!public!ps
}

-------

# Check Equallogic Fans
define service{
use generic-service
host_name equallogic01
service_description Fans
check_command check_equallogic!public!fan
}

-------

# Check Equallogic Temperature
define service{
use generic-service
host_name equallogic01
service_description Temperature
check_command check_equallogic!public!temp
}

-------

# Check Equallogic Volumes
define service{
use generic-service
host_name equallogic01
service_description Volumes
check_command check_equallogic!public!volumes!-w 90 -c 95
}

The thresholds for the type 'volumes' are normal integers. The volume-type check measures the disk utilization of all created ISCSI volumes. In this example a WARNING notification will be sent when at least one volume uses more than 90% of its capacity, a CRITICAL notification when at least one volume uses more than 95% of its capacity.
When no thresholds are given, the plugin will output the current utilization of all volumes and will be status OK.

-------

# Check Equallogic Ethernet Packet Errors
define service{
use generic-service
host_name equallogic01
service_description Eth Packet Errors
check_command check_equallogic!public!etherrors!-w 12 -c 14
}

The thresholds for the type 'etherrors' are normal integers. In case that ethernet packet errors were detected but the counter doesn't increase (e.g. at switch configuration and EQL installation) you may want to set thresholds. As soon as the ethernet packet errors counter increases, you will receive a warning (here at 12 errors) and a critical (here at 14 errors) notification. If no thresholds are given, the plugin will return a CRITICAL status when ethernet packet errors were found.

-------

# Check Equallogic Disk Pool Usage
define service{
use generic-service
host_name equallogic01
service_description Disk Pool Usage
check_command check_equallogic!public!poolusage!-w 90 -c 95
}

The thresholds for the type 'poolusage' are normal integers. In case a disk pool utilization is higher than the given warning threshold (here 90%), the plugin will return a WARNING status. If the utilization is above the given critical threshold (here 95%), the plugin will return a CRITICAL status. If no thresholds are set, the plugin will just output the actual utilization of the pool(s) and return an OK status.

-------

# Check Equallogic Single Volume
define service{
use generic-service
host_name equallogic01
service_description Volume V1
check_command check_equallogic!public!vol!-v V1 -w 90 -c 95
}

This check is a bit different than the others. If the type 'vol' is used, it is necessary to add the parameter -v followed by the name (string) of the wanted volume. In this example the selected volume is called "V1". See your Equallogic Groupadmin software for the names available (remember: You have set the names). The thresholds for the type 'vol' are normal integers. In case a volume utilization is higher than the given warning threshold (here 90%), the plugin will return a WARNING status. If the utilization is above the given critical threshold (here 95%), the plugin will return a CRITICAL status. If no thresholds are set, the plugin will just output the actual utilization of the named volume and return an OK status.

-------

# Check Equallogic Snapshots
define service{
use generic-service
host_name equallogic01
service_description Snapshots
check_command check_equallogic!public!snapshots!-c 95
}

The snapshots check type uses the warning threshold defined in the Group Manager application. The critical threshold can be set manually.

------------------------

Nagios screenshots:

Nagios Check Equallogic Critical

Nagios Check Equallogic OK

Nagios Check Equallogic Volumes Critical

Nagios Check Equallogic Raid Warning

Nagios Check Equallogic Single Volume Screenshot


Go to Homepage home
Linux Howtos how to's
Nagios Plugins nagios plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

8510 Days
until Death of Computers
Why?