check_equallogic

Last update: March 11, 2019

This is a plugin to monitor a Dell Equallogic with Nagios. Its written in bash so it should run on almost all Linux/Unix based systems. It's using SNMP (v2) to query the informations from the Equallogic device. To be able to use the script, please also check the requirements.

Download

Download check_equallogic.sh

check_equallogic.sh

16430 downloads so far...

Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).

Community contributions welcome on GitHub repo.

Compatibility matrix

The plugin has been successfully tested on the following Dell Equallogic devices:

  • PS100E (70-0011) with Firmware 3.3.x, 4.3.x
  • PS4000/E/XV (70-0120) with Firmwares 4.3.x, 5.0.x, 5.2.x
  • PS4100/X (70-0476) with Firmware 5.1.x, 5.2.x, 7.0.x
  • PS4110 (70-0478) with Firmware 7.0.x, 8.1.x
  • PS4210 (70-0485) with Firmware 8.0.x, 8.1.x
  • PS5000XV (70-0111) with Firmwares 4.1.x, 4.3.x, 5.0.x
  • PS5000E (70-0115) with Firmwares 3.3.x, 4.0.x, 4.1.x, 4.2.x, 4.3.x, 5.0.x
  • PS6000E (70-0202) with Firmwares 4.1.x, 4.2.x, 4.3.x, 5.0.x, 5.2.x, 6.0.x
  • PS6000XVx (70-0202) with Firmware 4.3.x, 5.0.x, 5.1.x, 5.2.x, 6.0.x
  • PS6010/6510X (70-0300) with Firmware 4.3.x, 5.0.x, 5.1.x, 5.2.x, 7.0.x
  • PS6100/E/X (70-0400) with Firmware 5.2.1, 7.0.x
  • PS6110/E/XV (70-0477) with Firmwares 6.0.x, 7.0.x, 7.1.x
  • PS6210/E/XV (?????) with Firmwares 7.1.x, 8.0.x
  • PSM4110 (70-0450, PowerEdge M1000e) with Firmwares 7.0.x

Please let me know if you have another Equallogic model and/or another firmware running.

Version history / Changelog

20091109 Started Script programming checks: health, disk, raid, uptime, ps, info
20091112 Added ethif, conn
20091118 Added diskusage
20091119 Bugfix on Outputs (removed Pipes)
20091121 Public Release
20091204 Bugfix (removed IP addresses)
20091206 Bugfix (removed SNMP community names)
20091222 Fixed raid, ps, health and diskusage checks when multiple member devices exists. By Mathias Sundman.
20100112 Successful tests on PS5000XV - thanks to Scott Sawin
20100209 Compatibility matrix now on website (see Tested on above)
20100416 Beta Testing for rewritten ethif check (allows more than 3 interfaces)
20100420 Corrected ethif output, finished new ethif check
20100526 Using proper order of snmpwalk command, thanks Roland Ripoll
20100531 Added performance data for diskusage and connections - thanks to Benoit Poulet.
20100622 Corrected perfdata output (+added thresholds), thanks to Christian Lauf
20100809 Fixed conn type -> Now the total number of connections of all members in a group is used
20101026 Using /bin/bash instead of /bin/sh again (Ubuntu users had problems due to /bin/sh symlink to /bin/dash)
20101026 Bugfix in snmpwalk usage (using vqe instead of vq), thanks to Fabio Panigatti
20101102 Added fan
20101202 Added volumes (checks utilization of all volumes)
20110315 Bugfix in Fan Warning check and changed output in diskusage check
20110323 Mysteriously disappeared 'temp' type added again. Thanks to Peter Wirdemo
20110328 Beta Testing for etherrors check by Martin Conzelmann
20110404 Added thresholds to etherrors check by Martin Conzelmann
20110404 Bugfix in volumes check
20110407 New temp check by Martin Conzelmann: Rewritten and more information in output
20110725 New disk check by Amir Shakoor (~6x faster). Some bugfixes then added.
20110804 New poolusage check by Chris Funderburg and Markus Becker (perfdata)
20110808 New vol check - checks single volume for utilization
20111013 Bugfix in vol check for similar vol names by Matt White
20111031 Bugfix in ethif check for int response by Francois Borlet-Hote
20120104 Bugfix in temp check if only one controller available
20120104 Bugfix in info check if only one controller available
20120123 Bugfix in volumes check
20120125 Added perfdata in volumes check, volume names now w/o quotes
20120319 Added poolconn check by Erwin Bleeker
20120330 Rewrite of poolusage (original poolusage is now called memberusage) by Erwin Bleeker
20120405 Bugfix in poolusage to show result without thresholds
20120430 Added snapshots type by Roland Penner
20120503 Rewrite of info check (Fix for multiple members, added firmware check)
20120815 Added percentage of raid rebuild when raid reconstructing
20120821 Minor bugfix in vol/volumes check (added space in perfdata)
20120911 Added percentage of raid rebuild when raid expanding
20120913 Bugfix in percentage output in raid check
20121204 Added percentage of raid rebuild when raid verifying
20121204 Changed raid percentage output when multiple members around
20121228 ps type now also checks for failed power supply fans
20130728 Added copy to spare raid status by Peter Lieven
20131024 Bugfix in temp check (Backplane_sensor_0 was not shown)
20131025 Optical cleanup
20131122 Bugfix in vol check when volumes spread across members
20131219 Bugfix in poolusage check when a pool was not used (0 size)
20140626 Bugfix in etherrors check
20140711 Added snmp connection check function
20150203 Bugfix in vol check in percentage calculation
20151006 Bugfix in vol check if volume not found by Stephane Loeuillet
20151126 Bugfix in memberusage and poolusage checks (missing newline)

Requirements

  • The following shell commands must exist and be executable by your Nagios user: snmpwalk, awk, grep, wc, cut
  • SNMP must be enabled on the Dell Equallogic device. If it is not already, enable it on the member.

Definition of the parameters

Parameter Description
-H* Hostname or IP address of Equallogic member
-C* SNMP Communityname (must be at least readable)
-t* Type of check you want to do (see the definition of types further down)
-v Name of single volume to check
-w Warning threshold (optional and only in combination with certain types)
-c Critical threshold (optional and only in combination with certain types)
--help Help text for correct usage of this script

* mandatory parameters

Definition of the check types

Check Type Description
conn Checks number of current ISCSI connections (thresholds possible)
disk Checks Status of all disks
diskuage Checks how much raid space is already used (thresholds possible)
etherrors Checks ethernet interfaces for ethernet packet errors
ethif Checks status of ethernet interfaces (thresholds possible)
fan Checks status of fans
health Checks overall health of Equallogic device
info Checks overall health of Equallogic device
memberusage Shows disk utilisation of all members of the same group (thresholds possible)
poolconn Check highest number of ISCSI connections per pool (thresholds possible)
poolusage Checks utilization of pools (thresholds possible)
ps Checks status of power supply(ies)
raid Checks RAID status
snapshots Checks Snapshot Reserve status (warning level is taken from the equallogic volume config, critical level can be set with -c )
temp Checks temperature sensors
uptime Shows uptime of Equallogic device
vol Checks a single volume, must be used with -v option (thresholds possible)
volumes Checks utilization of all created ISCSI volumes (thresholds possible)

Usage / running the plugin on the command line

Usage:

./check_equallogic.sh -H host -C community -t checktype [-v volume] [-w warning] [-c critical]

Examples:

./check_equallogic.sh -H 10.0.0.200 -C public -t disk
./check_equallogic.sh -H 10.0.0.200 -C public -t vol -v Volume1 -w 90 -c 95

Command definition

Command definition in Nagios, Icinga 1.x, Shinken, Naemon

# 'check_equallogic' command definition
define command{
command_name check_equallogic
command_line $USER1$/check_equallogic -H $HOSTADDRESS$ -C $ARG1$ -t $ARG2$ $ARG3$
}

Note: I defined the -C (SNMP Communityname) as a variable. You can set this to a static value if you use the same SNMP community for all your EQL hosts.

Command definition in Icinga 2.x

object CheckCommand "check_equallogic" {
  import "plugin-check-command"

  command = [ PluginContribDir + "/check_equallogic.sh" ]

  arguments = {
    "-H" = {
      value = "$equallogic_host$"
      description = "DNS hostname or IP address of the Equallogic member"
    }
    "-C" = {
      value = "$equallogic_community$"
      description = "SNMP community"
    }
    "-t" = {
      value = "$equallogic_checktype$"
      description = "Check Type"
    }
    "-v" = {
      value = "$equallogic_volume$"
      description = "Volume name for single volume check"
    }
    "-w" = {
      value = "$equallogic_warning$"
      description = "Warning threshold"
    }
    "-c" = {
      value = "$equallogic_critical$"
      description = "Critical threshold"
    }
  }

  vars.equallogic_host = "$address$"
  vars.equallogic_community = "public"
}

Service definition

Service definition examples in Nagios, Icinga 1.x, Shinken, Naemon

Show information of all EQL members of the same group:

# Check Equallogic Information
define service{
use generic-service
host_name eql1
service_description General Information
check_command check_equallogic!public!info
}

Health check:

# Check Equallogic Health
define service{
use generic-service
host_name eql1
service_description General Health
check_command check_equallogic!public!health
}

Physical drives check:

# Check Equallogic Disk Status
define service{
use generic-service
host_name eql1
service_description Disk Status
check_command check_equallogic!public!disk
}

Volumes check:

# Check Equallogic Volumes
define service{
use generic-service
host_name eql1
service_description Volumes
check_command check_equallogic!public!volumes!-w 90 -c 95
}

The thresholds for the type 'volumes' are normal integers. The volume-type check measures the disk utilization of all created ISCSI volumes. In this example a WARNING notification will be sent when at least one volume uses more than 90% of its capacity, a CRITICAL notification when at least one volume uses more than 95% of its capacity. Without thresholds the plugin will output the current utilization of all volumes and will be status OK.

Service object definition Icinga 2.x

The following example is a standard service object checking a single volume:

# Check Equallogic Volume MyVol1
object Service "Hardware" {
  import "generic-service"
  host_name "eql1"
  check_command = "check_equallogic"
  vars.equallogic_checktype = "volume"
  vars.equallogic_volume = "MyVol1"
  vars.equallogic_warning = "75"
  vars.equallogic_critical = "90"
}

Screenshots

check_equallogic disk status critical
check_equallogic checks ok
check_equallogic volumes critical
check_equallogic raid status warning disk usage critical
check_equallogic checks ok