Header RSS Feed
 
» Monitoring Plugins

Monitoring Plugin: check_smart

Last Update: October 19, 2017

This is a plugin to monitor the SMART values of hard drives. The plugin is a fork of check_smart released in 2009 by Kurt Yoder. The biggest change is that this fork allows to also to be used for disks behind a hardware raid controller.

Download check_smart check_smart.pl
Download plugin and save it in your Nagios plugin folder (e.g. /usr/local/nagios/libexec)
Contribute on https://github.com/Napsty/check_smart
4873 downloads so far...

Version History
Feb 3, 2009: Kurt Yoder - initial version of script (rev 1.0)
Jul 8, 2013: Claudio Kuenzler - support hardware raids like megaraid (rev 2.0)
Jul 9, 2013: Claudio Kuenzler - update help output (rev 2.1)
Oct 11, 2013: Claudio Kuenzler - making the plugin work on FreeBSD (rev 3.0)
Oct 11, 2013: Claudio Kuenzler - allowing -i sat (SATA on FreeBSD) (rev 3.1)
Nov 4, 2013: Claudio Kuenzler - works now with CCISS on FreeBSD (rev 3.2)
Nov 4, 2013: Claudio Kuenzler - elements in grown defect list causes warning (rev 3.3)
Nov 6, 2013: Claudio Kuenzler - add threshold option "bad" (-b) (rev 4.0)
Nov 7, 2013: Claudio Kuenzler - modified help (rev 4.0)
Nov 7, 2013: Claudio Kuenzler - bugfix in threshold logic (rev 4.1)
Mar 19, 2014: Claudio Kuenzler - bugfix in defect list perfdata (rev 4.2)
Apr 22, 2014: Jerome Lauret - implemented -g to do a global lookup (rev 5.0)
Apr 25, 2014: Claudio Kuenzler - cleanup, merge Jeromes code, perfdata output fix (rev 5.1)
May 5, 2014: Caspar Smit - Fixed output bug in global check / issue #3 (rev 5.2)
Feb 4, 2015: Caspar Smit and cguadall - Allow detection of more than 26 devices / issue #5 (rev 5.3)
Feb 5, 2015: Bastian de Groot - Different ATA vs. SCSI lookup (rev 5.4)
Feb 11, 2015: Josh Behrends - Allow script to run outside of nagios plugins dir / wiki url update (rev 5.5)
Feb 11, 2015: Claudio Kuenzler - Allow script to run outside of nagios plugins dir for FreeBSD too (rev 5.5)
Mar 12, 2015: Claudio Kuenzler - Change syntax of -g parameter (regex is now awaited from input) (rev 5.6)
Feb 6, 2017: Benedikt Heine - Fix Use of uninitialized value $device (rev 5.7)
Oct 10, 2017: Bobby Jones - Allow multiple devices for interface type megaraid, e.g. "megaraid,[1-5]" (rev 5.8)

Requirements
- smartmontools package (smartctl command is required)
- Perl
- For cciss (HP SmartArray) controllers, smartmontools >= 5.37
- Entry in sudoers

!Important note to FreeBSD users with cciss (HP SmartArray) servers!
FreeBSD turns around the order of the disks shown in smartctl. If you think cciss,0 is the first phyiscal drive, you're wrong. So wrong. It's actually the last disk. So you need to think the opposite. Example: You have four drives in a Proliant server. cciss,0 is the forth drive while cciss,3 is the first drive. That's why you should only mention the cciss number in the service description and not something like "Drive 1" (as it is wrong and might lead to pulling the wrong disk). To be sure which drive to identify correctly, please read this article.

------------------------

Definition of the parameters

Short Long Description
-d --device a physical block device to be SMART monitored, eg /dev/sda
-g --global a regular expression of physical devices to be monitored, eg "/dev/sd[a-z]" for devices /dev/sda until /dev/sdz
-i --interface device's interface type.
See http://sourceforge.net/apps/trac/smartmontools/wiki/Supported_RAID-Controllers for interface convention
If used in combination with -g/--global, megaraid interface supports regular expression, eg "-i megaraid,[8-9]"
-b* --bad* Threshold value (integer) when to warn for N bad entries
-h* --help* Show help / usage
-v* --version* Show plugin's version
  --debug* Show debugging information

*optional

------------------------

Command definition for single disk in your nrpe.cfg:

command[check_smart]=sudo /usr/lib/nagios/plugins/check_smart -d $ARG1$ -i $ARG2$ -b $ARG3$

Note: Although the -b (--bad) option is optional, I have "prepared" it in the nrpe command config. The value can also be "0", this is the same as if -b was not used.

Command definition for multiple disks in your nrpe.cfg:

command[check_smart_all]=sudo /usr/lib/nagios/plugins/check_smart -g $ARG1$ -i $ARG2$ -b $ARG3$

Note that the option -g was used here (for global).

------------------------

Sudoers entry:

This plugin needs to run as root, otherwise you're not able to lauch smartctl correctly. You have two options:

1) Launch the plugin itself as root with sudo
2) Lauch the plugin as Nagios user and the smartctl command as root with sudo

Here are some examples you can add to your sudoers with the command "visudo":

nagios ALL = NOPASSWD: /usr/local/libexec/nagios/check_smart.pl # for option 1 on FreeBSD
nagios ALL = NOPASSWD: /usr/local/sbin/smartctl # for option 2 on FreeBSD

nagios ALL = NOPASSWD: /usr/lib/nagios/plugins/check_smart.pl # for option 1 on Linux
nagios ALL = NOPASSWD: /usr/sbin/smartctl # for option 2 on Linux

------------------------

Service check examples:

# Check SMART of a typical single disk (or used in software raid)
define service{
  use generic-service
  host_name mylinux1
  service_description Disk SMART Status SDA
  check_command check_nrpe!check_smart!-a "/dev/sda" "sat" "0"
}

-------

# Check SMART of multiple disks with regex (looking for /dev/sda until /dev/sdf)
define service{
  use generic-service
  host_name mylinux1
  service_description Disk SMART Status
  check_command check_nrpe!check_smart_all!-a "/dev/sd[a-f]" "sat" "0"
}

-------

# Check SMART of multiple disks behind a MegaRaid controller with regex
define service{
  use generic-service
  host_name mylinux1
  service_description Disk SMART Status
  check_command check_nrpe!check_smart_all!-a "/dev/sda" "megaraid,[8-9]" "0"
}

-------

# Check SMART of a drive behind a cciss (HP SmartArray) raid controller
define service{
  use generic-service
  host_name myhpproliant1
  service_description Disk SMART Status cciss2
  check_command check_nrpe!check_smart!-a "/dev/cciss/c0d0" "cciss,2" "2"
}

Here the argument 3 ($ARG3$) is "2". This means that this disk already has 1 defect sector (1 Pending Sector for ATA or 1 Element in grown defect list for SCSI drives) and the warning theshold is increased to 2. As soon as the disk reaches two (or more) defect entries, a warning notification will happen. This helps to see if a disk is really failing and the number of defect sectors are growing.

-------

# Check SMART of a drive behind a LSI MegaRaid controller
define service{
  use generic-service
  host_name myintelserver1
  service_description Disk SMART Status SDA
  check_command check_nrpe!check_smart!-a "/dev/sda" "megaraid,8" "0"
}

Note that often the physical drives start at "slot" 8 on MegaRaid controllers. You should manually check with smartctl.

-------

# Check SMART of a drive behind an Intel raid controller on FreeBSD
define service{
  use generic-service
  host_name myfreebsd1
  service_description Disk SMART Status /dev/pass2
  check_command check_nrpe!check_smart!-a "/dev/pass2" "sat" "0"
}

To be able to see the physical drives behind an Intel (and MegaRaid) raid controller on FreeBSD, the kernel module mfip must be loaded ("kldload mfip.ko"). The physical drives then appear as /dev/passN.

------------------------

Nagios screenshots:

check_smart


Go to Homepage home
Linux Howtos how to's
Monitoring Plugins monitoring plugins
Links links

Valid HTML 4.01 Transitional
Valid CSS!
[Valid RSS]

7367 Days
until Death of Computers
Why?