Monitoring plugin check_smart 5.11 available, introducing exclude list

Written by - 0 comments

Published on - Listed in Monitoring Hardware Linux BSD Icinga Nagios


The monitoring plugin check_smart, to monitor hard drives' and solid state drives' SMART attributes, is out with a new version.

Version 5.11 introduces a new parameter "-e" or "--exclude" which stands for exclude list (aka ignore list).

The exclude list is a list of strings, separated by comma. The exclude list basically tells the plugin which SMART attributes to ignore, even if they are in a failing or failed state.

Let's take a temperature failed in the past error as an example.

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
194 Temperature_Celsius     0x0002   113   113   000    Old_age   Always  In_the_past  53 (Lifetime Min/Max 25/62)

Without the exclude list, the plugin will return a WARNING when the temperature SMART attribute once failed in the past:

# ./check_smart.pl -d /dev/sda -i sat
WARNING: Attribute Temperature_Celsius failed at In_the_past|Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0

It's a nice info that it once failed in the past. But once we know that, we get over it and want the warning to disappear. With the exclude list, the plugin can be told to ignore this attribute "Temperature_Celsius":

# ./check_smart.pl -d /dev/sda -i sat -e Temperature_Celsius
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0

And hurray, no alert anymore. 

But this could also be a bit dangerous. What if the drive has a new (live!) temperature alert? You'd certainly want to know about it. That's why, besides excluding a SMART attribute, it is also possible to exclude certain values in the "When_failed" column. In the following example, the "When_Failed" value "In_the_past" (as seen above) can be used in the exclude list:

# ./check_smart.pl -d /dev/sda -i sat -e "In_the_past"
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0

As you can see, the plugin doesn't alert anymore on the "Temperature_Celsius" because it detected the "In_the_past" value in the "When_failed" column and successfully ignored it.

To ignore multiple attributes, simply separate them with a comma:

# ./check_smart.pl -d /dev/sda -i sat -e "In_the_past","Current_Pending_Sector"
OK: no SMART errors detected. |Raw_Read_Error_Rate=0 Throughput_Performance=67 Spin_Up_Time=0 Start_Stop_Count=3 Reallocated_Sector_Ct=0 Seek_Error_Rate=0 Seek_Time_Performance=34 Power_On_Hours=10617 Spin_Retry_Count=0 Power_Cycle_Count=3 Power-Off_Retract_Count=3 Load_Cycle_Count=3 Temperature_Celsius=53 Reallocated_Event_Count=0 Current_Pending_Sector=0 Offline_Uncorrectable=0 UDMA_CRC_Error_Count=0

But you better make sure you're not cutting yourself with this. The main reason why the exclude list was created in the first place is clearly the temperature attribute.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder