check_smart with support for hardware raid controllers
Monday - Jul 8th 2013 - by - (0 comments)

Since a couple of years I successfully use the Nagios plugin check_smart (https://www.monitoringexchange.org/inventory/Check-Plugins/Hardware/Storage/Check-SMART-status) by Kurt Yoder to monitor the health of hard disks using the S.M.A.R.T. values.
It has always been working like a charm - as long as the OS was seeing the drives directly. In most cases I used the plugin in environments with software raid (mdadm) and therefore the disks were still seen as /dev/sda and /dev/sdb. 

However I got aware, that the plugin does not work with disks behind a hardware raid controller, for example MegaRAID, although the smartctl command (part of smartmontools) is able to read the SMART values through a hardware raid controller.

This happened:

./check_smart -d /dev/sda -i megaraid,8
invalid interface megaraid,8 for /dev/sda!

check_smart uses smartctl in the background, and smartctl itself works fine with megaraid (see http://sourceforge.net/apps/trac/smartmontools/wiki/Supported_RAID-Controllers) :

smartctl -d megaraid,8 -H /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/sda [megaraid_disk_09] [SAT]: Device open changed type from 'megaraid' to 'sat'
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

The issue lies in the plugin itself. It verifies if the given arguments contain either ata or scsi as interface types. By doing this, other interface types (like here megaraid) are not working and the plugin stops working.

I took the liberty and patched check_smart to accept hardware raid controllers as interface type.

Take a look at my github repository here: https://github.com/Napsty/check_smart .

I successfully tested it with megaraid, it may of course also work with others:

./check_smart -d /dev/sda -i megaraid,8
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=2958 Start_Stop_Count=13 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=603 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=13 Power-Off_Retract_Count=11 Load_Cycle_Count=1 Temperature_Celsius=32 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Multi_Zone_Error_Rate=0



Comments (newest first):

