A new version of check_smart, a monitoring plugin to monitor hard drives, solid state drives and NVMe drives, is available.
The newest release, 6.18.0, contains a fix and an enhancement, both reported from two individual users.
The mentioned bug likely only occurred on FreeBSD (and maybe other BSD derivatives). Inside the Perl code, the smartctl command is executed with a prefixed path, defined in the @sys_path array:
my @sys_path = qw(/usr/bin /bin /usr/sbin /sbin /usr/local/bin /usr/local/sbin);
my $smart_command = undef;
foreach my $path (@sys_path) {
if (-x "$path/smartctl") {
$smart_command = "sudo $path/smartctl";
last;
}
}
If you go through the code, the smartctl command is prefixed with the path (once found), but the sudo command is not. It therefore relied on the PATH environment and could lead to sudo command not found.
The new approach is to prefix both commands with the path and merge both into one command ($smart_command).
This bug was reported by Alexey Zonov.
The check_smart.pl plugin relies on the output of the smartctl command in the background. This is nothing new. When smartctl is unable to communicate with the block device, some strange errors are shown up in the output.
The plugin checks for the output for specific lines, including the "SMART overall-health self-assessment test result" line (for ATA compatible devices). When the relevant health line is not found, the plugin exits with the status UNKNOWN (exit code 3) and with the following output:
$ /usr/lib/nagios/plugins/check_smart.pl -d /dev/nvme1n1 -i nvme
UNKNOWN: Drive S/N : No health status line found, |
This doesn't forcibly mean, that the selected drive is dead. It just means, that smartctl's output did not show any health line. This could also happen if a wrong megaraid,N number was selected, pointing to the RAID controller itself instead of a drive. Hence the decision to exit with an UNKNOWN state.
But NVME drives have yet an additional output when smartctl is executed on a defective drive:
$ sudo smartctl -a /dev/nvme1n1
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.14.0-2-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
Read NVMe Identify Controller failed: NVME_IOCTL_ADMIN_CMD: Input/output error
This output is actually very helpful. It is very likely that this block device (/dev/nvme1n1) indeed has a major problem and is dead.
We can therefore assume with a high confidence, that check_smart.pl should alert with a CRITICAL message, when this line is detected in smartctl's output.
This kind-of-bug-but-also-feature-request was reported by Robert Scheck in GitHub issue 110.
And with that: Enjoy the new release. As always, if you encounter bugs, please report them on the GitHub repo.
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Znuny Zoneminder