check_nwc_health: rumms - UNKNOWN no interfaces

Written by - 0 comments

Published on - Listed in Network Icinga Nagios Monitoring


Today I had to solve a special case where an Icinga 2 satellite server ran out of disk space in /var. After I increased the disk size I noticed that almost all network switches, checked via this satellite using check_nwc_health, returned an UNKNOWN status. Service output: rumms. 

check_nwc_health rumms

I manually verified this on the cli:

# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode interface-usage --name Ethernet1/1
rumms
UNKNOWN - no interfaces

I manually re-listed all interfaces:

# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode list-interfaces
83886080 mgmt0
151060482 Vlan2
[...]
526649088 Ethernet101/1/29
526649152 Ethernet101/1/30
526649216 Ethernet101/1/31
526649280 Ethernet101/1/32
OK - have fun

And then the check worked again:

# /usr/lib/nagios/plugins/check_nwc_health --hostname aswitch --community public --mode interface-usage --name Ethernet1/1
OK - interface Ethernet1/1 (alias UCS-FI-A) usage is in:0.82% (82014272.36bit/s) out:3.21% (320758024.71bit/s) | 'Ethernet1/1_usage_in'=0.82%;80;90;0;100 'Ethernet1/1_usage_out'=3.21%;80;90;0;100 'Ethernet1/1_traffic_in'=82014272.36;8000000000;9000000000;0;10000000000 'Ethernet1/1_traffic_out'=320758024.71;8000000000;9000000000;0;10000000000

The reason for this is that by default check_nwc_health creates a "cached" list of interfaces per checked device. This cached list is a file in /var/tmp/check_nwc_health:

# ls -l /var/tmp/check_nwc_health | grep cache
-rw-r--r-- 1 nagios nagios  8192 Jul 20 08:03 01switch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios  8577 Jul 20 08:17 02switch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios  8192 Jul 20 08:04 aswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios     0 Jul 20 07:32 bswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios  8192 Jul 20 08:06 cswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios  7017 Jul 20 08:18 dswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios  7013 Jul 20 08:19 eswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios     0 Jul 20 07:31 fswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-rw-r-- 1 nagios nagios  9291 Jul 20 08:16 gswitch_interface_cache_81b3d521b731e73215515a4f1f4a3ccf
-rw-r--r-- 1 nagios nagios  6245 Jul 20 07:44 hswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios     0 Jul 20 07:46 iswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios  4096 Jul 20 08:12 jswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
-rw-r--r-- 1 nagios nagios  4096 Jul 20 07:46 kswitch_interface_cache_d2e08e73bba4b976b8b4dcdcf66e3c7d
[...]

Note the cache-files with a 0-byte size. That's an empty list of interfaces for the specific device - ergo unknown interface for any given interface.
Because /var was full during the time the interface cache file was written the last time, it was a 0-byte file causing check_nwc_health to think there are no interfaces at all on the network device to check.

By removing the cache files the check worked again (if there is no interface cache file, it will re-created).


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder