check_smart with support for hardware raid controllers

Written by - 0 comments

Published on - Listed in Nagios Hardware Linux Perl Monitoring

Since a couple of years I successfully use the Nagios plugin check_smart ( by Kurt Yoder to monitor the health of hard disks using the S.M.A.R.T. values.
It has always been working like a charm - as long as the OS was seeing the drives directly. In most cases I used the plugin in environments with software raid (mdadm) and therefore the disks were still seen as /dev/sda and /dev/sdb. 

However I got aware, that the plugin does not work with disks behind a hardware raid controller, for example MegaRAID, although the smartctl command (part of smartmontools) is able to read the SMART values through a hardware raid controller.

This happened:

./check_smart -d /dev/sda -i megaraid,8
invalid interface megaraid,8 for /dev/sda!

check_smart uses smartctl in the background, and smartctl itself works fine with megaraid (see :

smartctl -d megaraid,8 -H /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen,

/dev/sda [megaraid_disk_09] [SAT]: Device open changed type from 'megaraid' to 'sat'
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.

The issue lies in the plugin itself. It verifies if the given arguments contain either ata or scsi as interface types. By doing this, other interface types (like here megaraid) are not working and the plugin stops working.

I took the liberty and patched check_smart to accept hardware raid controllers as interface type.

Take a look at my github repository here: .

I successfully tested it with megaraid, it may of course also work with others:

./check_smart -d /dev/sda -i megaraid,8
OK: no SMART errors detected|Raw_Read_Error_Rate=0 Spin_Up_Time=2958 Start_Stop_Count=13 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Power_On_Hours=603 Spin_Retry_Count=0 Calibration_Retry_Count=0 Power_Cycle_Count=13 Power-Off_Retract_Count=11 Load_Cycle_Count=1 Temperature_Celsius=32 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0 Multi_Zone_Error_Rate=0


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder