check_rancher2

Last update: March 11, 2019

Monitoring Plugin check_rancher2

This is a monitoring plugin to check Kubernetes (Docker) container infrastructures managed with Rancher 2.x. It uses Rancher 2's API to monitor states of clusters, workloads or pods.

Note: This plugin is not created nor officially supported by Rancher Labs. As with most monitoring plugins, it's the (monitoring) open source community contributing to this plugin.

Download

Download check_rancher2.sh

check_rancher2.sh

downloads so far...

Download plugin and save it in your Nagios/Monitoring plugin folder (usually /usr/lib/nagios/plugins, depends on your distribution). Afterwards adjust the permissions (usually chmod 755).

Community contributions welcome on GitHub repo.

Version history / Changelog

# 20180629 alpha Started programming of script
# 20180713 beta1 Public release in repository
# 20180803 beta2 Check for "type", echo project name in "all workload" check, too
# 20180806 beta3 Fix important bug in for loop in workload check, check for 'paused'
# 20180906 beta4 Catch cluster not found and zero workloads in workload check
# 20180906 beta5 Fix paused check (type 'object' has no elements to extract (arg 5)
# 20180921 beta6 Added pod(s) check within a project
# 20180926 beta7 Handle a workflow in status 'updating' as warning, not critical
# 20181107 beta8 Missing pod check type in help, documentation completed
# 20181109 1.0.0 Do not alert for succeeded pods

Requirements

  • Rancher 2 API access
  • curl package/command
  • jshon package/command

How to create a Rancher2 API access

This section describes how to create an API access in Rancher 2.x. Log in to your Rancher 2 environment and on the top right corner, hover over your user icon and then click on 'API and Keys'.

Rancher 2 user api and keys

You will see an overview of existing API access tokens. To create a new API access, click on the button 'Add Key'.

Rancher 2 add api key

Rancher 2 will output the access credentials with two fields: The access key (username starting with token-) and the secret key (password). You must store these credentials in a safe place as this is the one and only time the password is shown.

Rancher 2 API access created

You can now use these credentials with check_rancher2.

Definition of the parameters

Parameter Description
-H Hostname (DNS name) of Rancher 2 management
-U Username for API access (will be in format 'token-xxxxx')
-P Password for API access
-S* Use secure connection (https) to Rancher API
-t Check type; defines what kind of check you want to run
-c* Cluster name (for specific cluster check)
-p* Project name (for specific project check, needed for workload checks)
-n* Namespace name (needed for specific pod checks)
-w* Workload name (needed for specific workload checks)
-o* Pod name (needed for specific pod checks, this makes only sense if you have pods with static names)
-h Show help and usage

*optional

Definition of the check types

Type Description
info Informs about available clusters and projects and their API ID's. These ID's are needed for specific checks.
cluster Checks the current status of all clusters or of a specific cluster (defined with -c clusterid)
project Checks the current status of all projects or of a specific project (defined with -p projectid)
workload Checks the current status of all or a specific (-w workloadname) workload within a project (-p projectid must be set!)
pod Checks the current status of all or a specific (-o podname) pod within a project (-p projectid must be set!).
A specific pod check can be used with -o podname, however this also requires the namespace (-n namespace).
Note: A specific pod check makes only sense if you use pods with static names

Usage / running the plugin on the command line

Usage:

./check_rancher2.sh -H hostname -U token -P password [-S] -t checktype [-p string] [-o string] [-n string] [-w string]

Example Check all pods within a project (c-5f7k2:p-4fdsd):

./check_rancher2.sh -H rancher2.example.com -S -U token-xxxxx -S -P aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai -t pod -p c-5f7k2:p-4fdsd
CHECK_RANCHER2 OK - All pods (85) in project c-5f7k2:p-4fdsd are running|'pods_total'=85;;;; 'pods_errors'=0;;;;

Example Single pod check within a project (c-5f7k2:p-4fdsd) and namespace (gamma):

./check_rancher2.sh -H rancher2.example.com -S -U token-xxxxx -S -P aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai -t pod -p c-5f7k2:p-4fdsd -n gamma -o nginx-85789c55b6-625tz
CHECK_RANCHER2 OK - Pod nginx-85789c55b6-625tz is running|'pod_active'=1;;;; 'pod_error'=0;;;;

Command definition

Command definition in Nagios, Icinga 1.x, Shinken, Naemon

# check_rancher2 command definition
define command{
  command_name check_rancher2
  command_line $USER1$/check_rancher2.sh -H $HOSTADDRESS$ -S -U $ARG1$ -P $ARG2$ -t $ARG3$ $ARG4$
}

Note: HTTPS is used in this case (-S). All mandatory parameters are fixed defined. The optional parameters can be added inside $ARG4$ (e.g. -c clustername).
Using $HOSTADDRESS$ as -H value assumes that you have created a host object which uses the Rancher2 DNS name as address.

Command definition in Icinga 2.x

# check_rancher2 command definition
object CheckCommand "check_rancher2" {
  import "plugin-check-command"
  command = [ PluginDir + "/check_rancher2.sh" ]

  arguments = {
   "-H" = "$rancher2_address$"
   "-U" = "$rancher2_username$"
   "-P" = "$rancher2_password$"
   "-S" = { set_if = "$rancher2_ssl$" }
   "-t" = "$rancher2_type$"
   "-c" = "$rancher2_cluster$"
   "-p" = "$rancher2_project$"
   "-n" = "$rancher2_namespace$"
   "-w" = "$rancher2_workload$"
   "-o" = "$rancher2_pod$"
  }

  vars.rancher2_address = "$address$"
  # If you only run one Rancher2, you can define api access here, too:
  #vars.rancher2_username = "token-xxxxx"
  #vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  #vars.rancher2_ssl = true
}

Service definition

Service definition in Nagios, Icinga 1.x, Shinken, Naemon

Check all pods in a project:

# Check Rancher 2 Pods in project c-5f7k2:p-4fdsd
define service{
  use generic-service
  host_name my-rancher2-host
  service_description Rancher2 Project 1 Pods
  check_command check_rancher2!token-xxxxx!aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai!pod!-c c-5f7k2:p-4fdsd
}

Service object definition Icinga 2.x

Information about discovered clusters and projects:

# Just show some info about discovered clusters and projects
object Service "Rancher2 Info" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "info"
}

Check all available/found clusters for their health:

# Check all available/found clusters for their health
object Service "Rancher2 All Clusters" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "cluster"
}

Check a single cluster for its health:

# Check a single cluster for its health
object Service "Rancher2 Cluster Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "cluster"
  vars.rancher2_cluster = "c-5f7k2"
}

Check all available/found projects (across all clusters) for their health:

# Check all available/found projects (across all clusters) for their health
object Service "Rancher2 All Projects" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "project"
}

Check a single project:

# Check a single projects
object Service "Rancher2 Project Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "project"
  vars.rancher2_project = "c-5f7k2:p-4fdsd"
}

Check all workloads in a certain project:

# Check all workloads in a certain project
object Service "Rancher2 Workloads in Project Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "workload"
  vars.rancher2_project = "c-5f7k2:p-4fdsd"
}

Check a single workload in a certain project:

# Check a single workload in a certain project
object Service "Rancher2 Workload Web in Project Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "workload"
  vars.rancher2_project = "c-5f7k2:p-4fdsd"
  vars.rancher2_workload = "Web"
}

Check all pods in a certain project:

# Check all pods in a certain project object Service "Rancher2 Pods in Project Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "pod"
  vars.rancher2_project = "c-5f7k2:p-4fdsd"
}

Check a single pod in a certain project and namespace:

# Check a single pod in a certain project and namespace
object Service "Rancher2 Pod Nginx1 in Project Test Namespace Test" {
  import "generic-service"
  host_name = "my-rancher2-host"
  check_command = "check_rancher2"
  vars.rancher2_username = "token-xxxxx"
  vars.rancher2_password = "aethooFaaGohthah8aezup5wiew5aedainooG2goh9Kaeti9hurolai"
  vars.rancher2_ssl = true
  vars.rancher2_type = "pod"
  vars.rancher2_project = "c-5f7k2:p-4fdsd"
  vars.rancher2_namespace = "test"
  vars.rancher2_pod = "nginx1"
}

Screenshots

check_rancher2 pods all ok
check_rancher2 grafana pods errors
check_rancher2 grafana pods total

Presentation

The monitoring plugin check_rancher2 was presented and introduced at the Open Source Monitoring Conference (OSMC) 2018 in Nuremberg, Germany. You can download the presentation as PDF document or watch the recorded video online.

Its all about the containers presentation