check_lxc

Last update: March 11, 2019

This is a plugin to monitor Linux Containers (LXC). It needs to run on the LXC host and allows to check CPU, Memory, Swap usage of a container. The plugin also allows to check for an automatic boot of a container.

Important note on cpu check: It is important to understand what the cpu check does. It will not tell you the usage of the CPU itself. Instead the number of jiffies (time spent on CPU) of a container is compared to the total number of jiffies of the host during a given period (default 5s, can be modified with -s parameter). This means the plugin's cpu check will tell you how much a container can be accounted for of the current cpu usage. This is not bullet proof, but it helps to find the most busy containers.

Download

Download check_lxc.sh

check_lxc.sh

630 downloads so far...

Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).

Community contributions welcome on GitHub repo.

Version history / Changelog

# 20130830 Finished first check (mem)
# 20130902 Added cgroup kernel boot parameter check (cgroup_active)
# 20130902 Fixed previous cgroup check (see issue #1)
# 20130902 Activated lxc_exists verification (finally turned to lxc_running)
# 20130902 Added new check type (auto)
# 20130912 Reorganizing code, put output calculation into function
# 20130912 Added new check type (swap)
# 20130913 Bugfix in swap check warning calculation
# 20160316 Make plugin work with LXC 1.x, too
# 20160316 In LXC 1.x, lxc-cgroup command needs sudo
# 20160318 Additional checks if swap value can be read
# 20160318 Perfdata of mem check: Only show 'max' when thresholds set
# 20160318 Adapt lxc_running function to work on 1.x, too
# 20160318 Add warn and crit values into mem check perfdata
# 20160318 Remove sudo commands within plugin, whole plugin requires sudo
# 20170710 Added cpu check type
# 20181203 Merged PR #4, #5 from BarbUk. Update GPL address. Increase version.
# 20181203 Fix issue #9 (added lxc-cgroup sanity check)
# 20181204 Merged PR #6 from BarbUk (shellcheck)

Requirements

  • Plugin must be executed with sudo (sudoers entry needed)
  • LXC commands
  • cgroups enabled
  • Bash internal commands/functions (plugin checks for its existance)

Sudoers entry

This plugin needs to run as root, otherwise you're not able to lauch certain lxc commands correctly, which the plugin does.

# User privilege specification
nagios ALL = NOPASSWD: /usr/lib/nagios/plugins/check_lxc.sh

Requirements for memory check

To be able to run the memory check (-t mem), the cgroup subsys "memory" must be enabled. You can verify this manually by running:

cat /proc/cgroups | grep memory

If the value in column enabled is a zero (0), then add the following options as your kernel boot parameter: "cgroup_enable=memory" and "swapaccount=1". In Debian this can be done by modifying /etc/default/grub2 followed by a update of the grub2 and reboot:

cat /etc/default/grub | grep CMDLINE_LINUX_DEFAULT
GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1"

update-grub2
reboot

After a reboot, check if the cmdline contains the two additional values:

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-4.9.0-8-amd64 root=UUID=fd9dea41-c89e-41f6-9be7-96c1fc892c0e ro quiet cgroup_enable=memory swapaccount=1

Then verify if you can get the memory statistics of a container:

lxc-cgroup -n lxctest01 memory.stat

Definition of the parameters

Parameter Description
-n name of container (or ALL for some types)
-t check type; defines what kind of check you want to run
-u* unit of output values (k|m|g)
-w* warning threshold in percent for memory checks; makes only sense if cgroup limit is set in container config
-c* critical threshold in percent for memory checks; makes only sense if cgroup limit is set in container config
-s* sleep in seconds between cpu checks to calculate the jiffie difference (default: 5)
--help Show help and usage

*optional

Definition of the check types

Types Description
mem Check the memory usage of the given container (thresholds in percent). Thresholds make only sense if you have limited the container's memory resources in the first place.
swap Check the swap usage (thresholds in MB)
cpu Check cpu usage (percentage) of a container (thresholds in percent)
auto Check autostart of a container or all containers (-n ALL)

Usage / running the plugin on the command line

Usage:

./check_lxc.sh -n container -t checktype [-u k|m|g] [-w warning] [-c critical] [-s int]

Example check cpu usage of container container1:

./check_lxc.sh -n irczsrvc06 -t cpu
LXC irczsrvc06 OK - CPU Usage: 17%|cpu=17%;;;0;0

Example check memory usage of container container1 with thresholds:

./check_lxc.sh -n nha1 -t mem -w 50 -c 90
LXC nha1 WARNING - Used Memory: 77% (50689 MB)|mem=53152133120B;34359738350;61847529030;0;68719476736

Example check autostart of all containers:

./check_lxc.sh -n nha1 -t auto
LXC AUTOSTART CRITICAL: nha1

Command definition (NRPE)

NRPE Command definition for simple checks without thresholds:

command[check_lxc]=sudo /usr/lib/nagios/plugins/check_lxc.sh -n $ARG1$ -t $ARG2$

NRPE Command definition for check with thresholds:

command[check_lxc_thresholds]=sudo /usr/lib/nagios/plugins/check_lxc.sh -n $ARG1$ -t $ARG2$ -w $ARG3$ -c $ARG4$

Service definition

Service definition in Nagios, Icinga 1.x, Shinken, Naemon

In this example, the cpu usage check happens on host lxchost1, executed by NRPE. No thresholds were given. This means that there will be no alerts but the check is rather used for graphing and information.

# Check cpu usage of lxc mylxc01
define service{
  use generic-service
  host_name lxchost1
  service_description LXC CPU mylxc01
  check_command check_nrpe!check_lxc!mylxc01!cpu
}

Service object definition Icinga 2.x

In this example, the memory usage check happens on lxchost1, executed by NRPE. Warning threshold is set to 80%, critical threshold is set to 90%. As soon as the container mylxc01 uses more than 80% of its memory capacity (when cgroup limits were defined), the alerts will be triggered.

# Check memory usage of lxc mylxc01
object Service "LXC Memory Usage mylxc01" {
  import "generic-service"
  host_name = "lxchost1"
  check_command = "nrpe"
  vars.nrpe_command = "check_lxc_thresholds"
  vars.nrpe_arguments = [ "mylxc01", "mem", "80", "95" ]
}

Screenshots

check_lxc all ok
check_lxc container cpu graph
check_lxc container memory graph

Presentation

The monitoring plugin check_lxc was presented at the Open Source Monitoring Conference (OSMC) 2018 in Nuremberg, Germany. You can download the presentation as PDF document or watch the recorded video online.

Its all about the containers presentation