User quota disk space monitoring with check_disk
Monday - Mar 18th 2013 - by - (0 comments)

The official Nagios plugin 'check_disk' is one of the oldest and probably one of the most powerful plugins. However in most scenarios the disk checks using check_disk are only using a percentage of the full force. 

A very typical disk check looks like this:

# check free disk space on / partition
/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /
DISK OK - free space: / 516 MB (56% inode=98%);| /=393MB;791;890;0;989

There's also the possibility to check all available partitions with one command:

# check all free disk space
/usr/lib/nagios/plugins/check_disk -w 20% -c 10%
DISK CRITICAL - free space: / 516 MB (56% inode=98%); /dev 0 MB (0% inode=-); /tmp 910 MB (99% inode=99%); /usr 72906 MB (66% inode=97%); /var 5210 MB (57% inode=95%); .................

But with this command the plugin returns a HUGE list of different file systems. Why? Because on this particular server (a shared hosting server running FreeBSD) has all its customer folders chrooted with zfs quota. So each customer is shown like this in df:

df -h | grep webcustomer1
datapool/home/webcustomer1   15G    7.5G    7.5G    50%    /home/webcustomer1/
/bin                         989M    393M    516M    43%    /home/webcustomer1/bin
/lib                         989M    393M    516M    43%    /home/webcustomer1/lib
/libexec                     989M    393M    516M    43%    /home/webcustomer1/libexec
/usr/bin                     116G     36G     71G    34%    /home/webcustomer1/usr/bin
/usr/lib                     116G     36G     71G    34%    /home/webcustomer1/usr/lib
/usr/lib32                   116G     36G     71G    34%    /home/webcustomer1/usr/lib32
/usr/local/bin               116G     36G     71G    34%    /home/webcustomer1/usr/local/bin
/usr/local/lib               116G     36G     71G    34%    /home/webcustomer1/usr/local/lib
/usr/local/share             116G     36G     71G    34%    /home/webcustomer1/usr/local/share
/usr/share/locale            116G     36G     71G    34%    /home/webcustomer1/usr/share/locale
/usr/share/misc              116G     36G     71G    34%    /home/webcustomer1/usr/share/misc
devfs                        1.0k    1.0k      0B   100%    /home/webcustomer1/dev

So for each customer there are several mount points which are actually always the same mount points for all customers. check_disk doesn't care about that - it will measure bin, lib, dev, etc for each customer. If you count 300 customers you make check_disk check at least 3900 file systems. Not even overkill is a word which would match that correctly....

So I dug into the manpage of check_disk and took a look what is actually possible. I found two methods how to use check_disk to check the user quotas in a fast and reliable way.

1) By using a regular expression and ignore list

# check the home file systems but ignore bin, lib, libexec, usr, etc. Do not output each file system which is OK (-e).
/usr/lib/nagios/plugins/check_disk -w 100 -c 50 -r home -i "(bin|lib|libexec|usr|dev)"
DISK OK| /home=0MB;329293;329343;0;329393 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 /home/webcustomer3/=897MB;5020;5070;0;5120 .......

Time to explain. I set the warning threshold to 100 (MB), the critical threshold to 50 (MB). I told check_disk to look for "home" in the file systems (-r home) but to ignore file systems containing bin, lib, libexec, usr and dev.
The -e parameter doesn't output the DISK OK status for each found file systems, it only would output the file system in a warning or critical state.

2) By ignoring certain file system types (preferred)

As I mentioned above, the bin, lib, libexec, etc file systems under each customer's chroot are actually all the same source/location from the system. They are all mounted as "nullfs" file system pointing to the original mount points from the system. As an example: /home/webcustomer1/bin is mounted from the original /bin (as it can be seen on the df output). By using df -T (to see the file system type), this can be verified:

df -T /home/webcustomer1/bin
Filesystem  Type   1K-blocks   Used  Avail Capacity  Mounted on
/bin        nullfs   1012974 402712 529226    43%    /home/fibervalais/bin

So instead of manually ignoring all the sub-mountpoints, it is actually easier to tell check_disk to ignore certain file system types. In this case nullfs and devfs.

# check the home file systems but ignore nullfs and devfs file system types. Do not output each file system which is ok (-e)
./check_disk -w 100 -c 50 -e -r home -X nullfs -X devfs
DISK OK| /home=0MB;329291;329341;0;329391 /home/webcustomer1/=35MB;5020;5070;0;5120 /home/webcustomer2/=24MB;5020;5070;0;5120 ........

If you wonder about using twice -X (as seen above), do not worry. This can be used several times (also written in the manpage of check_disk).


