check_smart

Last update: March 11, 2019

This is a plugin to monitor the SMART values of hard and solid state drives. The plugin is a fork of check_smart released in 2009 by Kurt Yoder. The biggest change is that this fork allows to also to be used for disks behind a hardware raid controller.

Download

Download check_smart.pl

check_smart.pl

6403 downloads so far...

Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).

Community contributions welcome on GitHub repo.

Version history / Changelog

Feb 3, 2009: Kurt Yoder - initial version of script (rev 1.0)
Jul 8, 2013: Claudio Kuenzler - support hardware raids like megaraid (rev 2.0)
Jul 9, 2013: Claudio Kuenzler - update help output (rev 2.1)
Oct 11, 2013: Claudio Kuenzler - making the plugin work on FreeBSD (rev 3.0)
Oct 11, 2013: Claudio Kuenzler - allowing -i sat (SATA on FreeBSD) (rev 3.1)
Nov 4, 2013: Claudio Kuenzler - works now with CCISS on FreeBSD (rev 3.2)
Nov 4, 2013: Claudio Kuenzler - elements in grown defect list causes warning (rev 3.3)
Nov 6, 2013: Claudio Kuenzler - add threshold option "bad" (-b) (rev 4.0)
Nov 7, 2013: Claudio Kuenzler - modified help (rev 4.0)
Nov 7, 2013: Claudio Kuenzler - bugfix in threshold logic (rev 4.1)
Mar 19, 2014: Claudio Kuenzler - bugfix in defect list perfdata (rev 4.2)
Apr 22, 2014: Jerome Lauret - implemented -g to do a global lookup (rev 5.0)
Apr 25, 2014: Claudio Kuenzler - cleanup, merge Jeromes code, perfdata output fix (rev 5.1)
May 5, 2014: Caspar Smit - Fixed output bug in global check / issue #3 (rev 5.2)
Feb 4, 2015: Caspar Smit and cguadall - Allow detection of more than 26 devices / issue #5 (rev 5.3)
Feb 5, 2015: Bastian de Groot - Different ATA vs. SCSI lookup (rev 5.4)
Feb 11, 2015: Josh Behrends - Allow script to run outside of nagios plugins dir / wiki url update (rev 5.5)
Feb 11, 2015: Claudio Kuenzler - Allow script to run outside of nagios plugins dir for FreeBSD too (rev 5.5)
Mar 12, 2015: Claudio Kuenzler - Change syntax of -g parameter (regex is now awaited from input) (rev 5.6)
Feb 6, 2017: Benedikt Heine - Fix Use of uninitialized value $device (rev 5.7)
Oct 10, 2017: Bobby Jones - Allow multiple devices for interface type megaraid, e.g. "megaraid,[1-5]" (rev 5.8)
Apr 28, 2018: Pavel Pulec (Inuits) - allow type "auto" (rev 5.9)
May 5, 2018: Claudio Kuenzler - Check selftest log for errors using new parameter -s (rev 5.10)
Dec 27, 2018: Claudio Kuenzler - Add exclude list (-e) to ignore certain attributes (5.11)
Jan 8, 2019: Claudio Kuenzler - Fix 'Use of uninitialized value' warnings (5.11.1)

Requirements

  • Perl
  • smartmontools package (smartcl command is required)
  • For cciss (HP SmartArray) controllers, smartmontools >= 5.37
  • Entry in sudoers

Sudoers entry

This plugin needs to run as root, otherwise you're not able to lauch smartctl correctly. You have two options:

  • Launch the plugin itself as root with sudo
  • Launch the plugin itself as nagios user and the smartctl command as root with sudo

Here are some examples you can add to your sudoers with the command "visudo":

nagios ALL = NOPASSWD: /usr/local/libexec/nagios/check_smart.pl # for option 1 on FreeBSD
nagios ALL = NOPASSWD: /usr/local/sbin/smartctl # for option 2 on FreeBSD

nagios ALL = NOPASSWD: /usr/lib/nagios/plugins/check_smart.pl # for option 1 on Linux
nagios ALL = NOPASSWD: /usr/sbin/smartctl # for option 2 on Linux

Definition of the parameters

Short Long Description
-d --device a physical block device to be SMART monitored, eg /dev/sda
-g --global a regular expression of physical devices to be monitored, eg "/dev/sd[a-z]" for devices /dev/sda until /dev/sdz
-i --interface device's interface type.
See https://www.smartmontools.org/wiki/Supported_RAID-Controllers for interface types
If used in combination with -g/--global, megaraid interface supports regular expression, eg "-i megaraid,[8-9]"
-b* --bad* Threshold value (integer) when to warn for N bad entries (ATA: Current Pending Sector, SCSI: Grown defect list)
-e* --exclude* List of (comma separated) SMART attributes which should be excluded (=ignored).
Also supports "When_failed" values, e.g. "In_the_past".
-s* --selftest* Additionally check SMART's selftest log for errors
-h* --help Show help/usage
-v* --version* Show plugin version
N/A --debug* Show debugging information

Usage / running the plugin on the command line

Usage:

./check_smart.pl (-d device|-g regex) -i interface [-b threshold] [-e exclude] [-s]

Example: SATA Disk:

./check_smart.pl -d /dev/sda -i ata

Example: Drive attached to MegaRAID controller:

./check_smart.pl -d /dev/sda -i megaraid,8

Example: Intel RAID on FreeBSD 9.2 ("kldload mfip.ko" required):

/usr/local/libexec/nagios/check_smart.pl -d /dev/pass0 -i scsi

Example: SATA drives behind Intel RAID on FreeBSD 9.2 ("kldload mfip.ko" required):

/usr/local/libexec/nagios/check_smart.pl -d /dev/pass12 -i sat

Example: SCSI drives behind HP RAID (CCISS) on FreeBSD 6.0:

/usr/local/libexec/nagios/check_smart.pl -d /dev/ciss0 -i cciss,0
OK: no SMART errors detected|defect_list=0 sent_blocks=3093462752 temperature=24;;68

/usr/local/libexec/nagios/check_smart.pl -d /dev/ciss0 -i cciss,3
WARNING: 48 Elements in grown defect list | defect_list=48 sent_blocks=1137657348 temperature=22;;68

Example: Using threshold option (-b) to ignore 1 bad element, warning only when 2 bad elements are found:

/usr/local/libexec/nagios/check_smart.pl -d /dev/ciss0 -i cciss,1 -b 2
OK: 1 Elements in grown defect list (but less than threshold 2)|defect_list=1;2;2;; sent_blocks=2769458900762624 temperature=27;;65

Example: Check all SATA disks (sda - sdz) at the same time on Linux:

/usr/lib/nagios/plugins/check_smart.pl -g "/dev/sd[a-z]" -i ata
OK: [/dev/sda] - Device is clean --- [/dev/sdb] - Device is clean|

Example: Check all SCSI disks behind Intel RAID on FreeBSD 9.2 ("kldload mfip.ko" required):

/usr/local/libexec/nagios/check_smart.pl -g "/dev/pass[1-9]" -i scsi
OK: [/dev/pass0] - Device is clean --- [/dev/pass1] - Device is clean --- [/dev/pass2] - Device is clean --- [/dev/pass3] - Device is clean --- [/dev/pass4] - Device is clean --- [/dev/pass5] - Device is clean --- [/dev/pass6] - Device is clean --- [/dev/pass7] - Device is clean --- [/dev/pass8] - Device is clean --- [/dev/pass9] - Device is clean |

Example: Single SCSI drive on FreeBSD 10.1:

/usr/local/libexec/nagios/check_smart.pl -d /dev/da0 -i scsi
OK: no SMART errors detected. |sent_blocks=14067306 temperature=34;;60

Command definition (NRPE)

Command definition for single drive in your nrpe.cfg:

command[check_smart]=sudo /usr/lib/nagios/plugins/check_smart.pl -d $ARG1$ -i $ARG2$ -b $ARG3$

Command definition for multiple drives using -g parameter in your nrpe.cfg:

command[check_smart_all]=sudo /usr/lib/nagios/plugins/check_smart.pl -g $ARG1$ -i $ARG2$ -b $ARG3$

Service definition

Service definition in Nagios, Icinga 1.x, Shinken, Naemon

Basic check of a single drive (or drive in software raid):

# Check SMART of a typical single disk (or used in software raid)
define service{
  use generic-service
  host_name mylinux1
  service_description Disk SMART Status SDA
  check_command check_nrpe!check_smart!-a "/dev/sda" "sat" "0"
}

Check SMART of multiple disks at same time:

# Check SMART of multiple disks with regex (looking for /dev/sda until /dev/sdf)
define service{
  use generic-service
  host_name mylinux1
  service_description Disk SMART Status
  check_command check_nrpe!check_smart_all!-a "/dev/sd[a-f]" "sat" "0"
}

Check SMART of a drive behind a cciss (HP SmartArray) controller:

# Check SMART of a drive behind a cciss (HP SmartArray) raid controller
define service{
  use generic-service
  host_name myhpproliant1
  service_description Disk SMART Status cciss2
  check_command check_nrpe!check_smart!-a "/dev/cciss/c0d0" "cciss,2" "2"
}

Here the argument 3 ($ARG3$) is "2". This means that this disk already has 1 defect sector (1 Pending Sector for ATA or 1 Element in grown defect list for SCSI drives) and the warning theshold is increased to 2. As soon as the disk reaches two (or more) defect entries, a warning notification will happen. This helps to see if a disk is really failing and the number of defect sectors are growing.

Service object definition Icinga 2.x

Check a single SATA drive with 0 bad sectors threshold

# SMART Check of drive sda
object Service "Hardware" {
  import "generic-service"
  host_name "linuxserver1"
  check_command = "nrpe"
  vars.nrpe_command = "check_smart"
  vars.nrpe_arguments = ["/dev/sda", "sat", "0"]
}

Screenshots

check_smart multiple alerts
check_smart warning
check_smart warning
check_smart all ok with values below threshold
check_smart self log warning