The official Nagios plugin 'check_disk' is one of the oldest and probably one of the most powerful plugins. However in most scenarios the disk checks using check_disk are only using a percentage of the full force.
A very typical disk check looks like this:
# check free disk space on / partition
/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 516 MB (56% inode=98%);| /=393MB;791;890;0;989
There's also the possibility to check all available partitions with one command:
# check all free disk space
/usr/lib/nagios/plugins/check_disk -w 20% -c 10%
DISK CRITICAL - free space: / 516 MB (56% inode=98%); /dev 0 MB (0% inode=-); /tmp 910 MB (99% inode=99%); /usr 72906 MB (66% inode=97%); /var 5210 MB (57% inode=95%); .................
But with this command the plugin returns a HUGE list of different file systems. Why? Because on this particular server (a shared hosting server running FreeBSD) has all its customer folders chrooted with zfs quota. So each customer is shown like this in df:
df -h | grep webcustomer1
datapool/home/webcustomer1 15G 7.5G 7.5G 50% /home/webcustomer1/
/bin 989M 393M 516M 43% /home/webcustomer1/bin
/lib 989M 393M 516M 43% /home/webcustomer1/lib
/libexec 989M 393M 516M 43% /home/webcustomer1/libexec
/usr/bin 116G 36G 71G 34% /home/webcustomer1/usr/bin
/usr/lib 116G 36G 71G 34% /home/webcustomer1/usr/lib
/usr/lib32 116G 36G 71G 34% /home/webcustomer1/usr/lib32
/usr/local/bin 116G 36G 71G 34% /home/webcustomer1/usr/local/bin
/usr/local/lib 116G 36G 71G 34% /home/webcustomer1/usr/local/lib
/usr/local/share 116G 36G 71G 34% /home/webcustomer1/usr/local/share
/usr/share/locale 116G 36G 71G 34% /home/webcustomer1/usr/share/locale
/usr/share/misc 116G 36G 71G 34% /home/webcustomer1/usr/share/misc
devfs 1.0k 1.0k 0B 100% /home/webcustomer1/dev
So for each customer there are several mount points which are actually always the same mount points for all customers. check_disk doesn't care about that - it will measure bin, lib, dev, etc for each customer. If you count 300 customers you make check_disk check at least 3900 file systems. Not even overkill is a word which would match that correctly....
So I dug into the manpage of check_disk and took a look what is actually possible. I found two methods how to use check_disk to check the user quotas in a fast and reliable way.
1) By using a regular expression and ignore list
# check the home file systems but ignore bin, lib, libexec, usr, etc. Do not output each file system which is OK (-e).
/usr/lib/nagios/plugins/check_disk -w 100 -c 50 -r home -i "(bin|lib|libexec|usr|dev)"
DISK OK| /home=0MB;329293;329343;0;329393 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 /home/webcustomer3/=897MB;5020;5070;0;5120 .......
Time to explain. I set the warning threshold to 100 (MB), the critical threshold to 50 (MB). I told check_disk to look for "home" in the file systems (-r home) but to ignore file systems containing bin, lib, libexec, usr and dev.
The -e parameter doesn't output the DISK OK status for each found file systems, it only would output the file system in a warning or critical state.
2) By ignoring certain file system types (preferred)
As I mentioned above, the bin, lib, libexec, etc file systems under each customer's chroot are actually all the same source/location from the system. They are all mounted as "nullfs" file system pointing to the original mount points from the system. As an example: /home/webcustomer1/bin is mounted from the original /bin (as it can be seen on the df output). By using df -T (to see the file system type), this can be verified:
df -T /home/webcustomer1/bin
Filesystem Type 1K-blocks Used Avail Capacity Mounted on
/bin nullfs 1012974 402712 529226 43% /home/fibervalais/bin
So instead of manually ignoring all the sub-mountpoints, it is actually easier to tell check_disk to ignore certain file system types. In this case nullfs and devfs.
# check the home file systems but ignore nullfs and devfs file system types. Do not output each file system which is ok (-e)
./check_disk -w 100 -c 50 -e -r home -X nullfs -X devfs
DISK OK| /home=0MB;329291;329341;0;329391 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 ........
If you wonder about using twice -X (as seen above), do not worry. This can be used several times (also written in the manpage of check_disk).