» Nagios/Monitoring Plugins
Monitoring Plugin: check_es_system
Last Update:
February 20, 2019
This is a monitoring plugin to check the status of an ElasticSearch cluster node. Besides the classical status check (green, yellow, red) this plugin also allows to monitor disk or memory usage of Elasticsearch. This is especially helpful when running Elasticsearch in the cloud (e.g. Elasticsearch as a service) because, as ES does not run on your own server, you cannot monitor the disk or memory usage. This is where this plugin comes in. Just tell the plugin how much resources (diskspace, memory capacity) you have available (-d) and it will alarm you when you reach a threshold.
Jump to...
Download
Version History
Requirements
Definition of parameters
Plugin usage
Command definition
Nagios and Icinga 1 service definition examples
Icinga 2 service definition examples
Screenshots
------------------------
Download
------------------------
Version History
20160429: Started programming plugin
20160601: Continued programming. Working now as it should =)
20160906: Added memory usage check, check types option (-t)
20160906: Renamed plugin from check_es_store to check_es_system
20160907: Change internal referenced variable name for available size
20160907: Output now contains both used and available sizes
20161017: Add missing -t in usage output
20180105: Fix if statement for authentication (@deric)
20180105: Fix authentication when wrong credentials were used
20180313: Configure max_time for Elastic to respond (@deric)
20190219: Fix alternative subject name in ssl (issue 4), direct to auth
20190220: Added status check type
------------------------
Requirements
- curl command (SUSE: zypper in curl, Debian/Ubuntu: apt-get install curl)
- jshon command (SUSE: search
for jshon, Debian/Ubuntu: apt-get install jshon)
- Other bash relevant commands as expr (plugin checks for its existance)
------------------------
Definition of parameters
Parameter |
Description |
-H * |
Hostname or ip address of ElasticSearch Node |
-P |
Port (defaults to 9200) |
-S |
Use https |
-u |
Username if authentication is required |
-p |
Password if authentication is required |
-t * |
Type of check to run (disk|mem|status) |
-d + |
Available disk or memory size (ex. 20) |
-o |
Size unit (K|M|G) (defaults to G) |
-w |
Warning threshold in percent (default: 80) |
-c |
Critical threshold in percent (default: 95) |
-m |
Maximum time in seconds to wait for server response (default: 30) |
-h |
Help! |
*mandatory parameters!
+mandatory for check types disk and mem
------------------------
Plugin usage
The following examples show how the plugin is executed on the command line, showing different possible outputs and stati.
Example 1: Classical status check. Here the Elasticsearch cluster runs on escluster.example.com and is accessed using HTTPS (-S to enable https) on port 9243 using the basic auth credentials user and password. The output shows the cluster status (green) and some additional information.
As performance data the node numbers, shard information and total number of documents are used.
Note: When the status changes to yellow (= WARNING) or red (=CRITICAL) the output also contains information of relocating, initializing and unassigned shards. The performance data stays the same to not confuse graphing databases.
./check_es_system.sh -H escluster.example.com -P 9243 -S -u user -p password -t status
ES SYSTEM OK - Elasticsearch Cluster is green (3 nodes, 2 data nodes, 114 shards, 8426885 docs)|total_nodes=3;;;; data_nodes=2;;;; total_shards=114;;;; relocating_shards=0;;;; initializing_shards=0;;;; unassigned_shards=0;;;; docs=8426885;;;;
Example 2: Disk usage check. The same Elasticsearch cluster as before is accessed. As this ES runs in the cloud, we do not have host monitoring available (meaning: We cannot do file system monitoring). But we know, we have 128GB available disk space. We tell the plugin we have a capacity of 128GB (-d 128). Let the plugin do the rest:
./check_es_system.sh -H escluster.example.com -P 9243 -S -u user -p password -t disk -d 128
ES SYSTEM OK - Disk usage is at 14% (18 G from 128 G)|es_disk=19637018938B;109951162777;130567005798;;
Example 3: Memory usage check. Same as before, ES runs in the cloud and we cannot do memory monitoring on the host itself. But we have booked our Elasticsearch service with 24GB RAM/memory.
./check_es_system.sh -H escluster.example.com -P 9243 -S -u user -p password -t mem -d 24
ES SYSTEM OK - Memory usage is at 58% (14 G from 24 G)|es_memory=15107616304B;20615843020;24481313587;;
------------------------
Command definition in Nagios and Icinga 1
# 'check_es_system' command definition
define command{
command_name check_es_system
command_line $USER1$/check_es_system.sh -H $ARG1$ -d $ARG2$ -t $ARG3$ $ARG4$
}
Command definition in Icinga 2
object CheckCommand "check_es_system" {
import "plugin-check-command"
command = [ PluginContribDir + "/check_es_system.sh" ]
arguments = {
"-H" = {
value = "$es_address$"
description = "Hostname
or IP Address of ElasticSearch node"
}
"-P" = {
value = "$es_port$"
description = "Port number
(default: 9200)"
}
"-S" = {
set_if = "$es_ssl$"
description = "Use https"
}
"-u" = {
value = "$es_user$"
description = "Username
if authentication is required"
}
"-p" = {
value = "$es_password$"
description = "Password
if authentication is required"
}
"-d" = {
value = "$es_available$"
description = "Define the
available disk or memory size of your ES cluster"
}
"-t" = {
value = "$es_checktype$"
description = "Define the
type of check (disk|mem|status)"
}
"-o" = {
value = "$es_unit$"
description = "Choose sizing
unit (K|M|G) - defaults to G for GigaByte"
}
"-w" = {
value = "$es_warn$"
description = "Warning
threshold in percent (default: 80)"
}
"-c" = {
value = "$es_crit$"
description = "Critical
threshold in percent (default: 95)"
}
"-m" = {
value = "$es_max_time$"
description = "Maximum
time in seconds (timeout) for Elasticsearch to respond (default: 30)"
}
}
vars.es_address = "$address$"
vars.es_ssl = false
}
------------------------
Nagios and Icinga 1 service check examples
# Check ElasticSearch Store
define service{
use generic-service
host_name myesnode
service_description ElasticSearch Store Usage
check_command check_es_system!myescluster.in.the.cloud!50!disk!-u read
-p only
}
In this example, the disk check happens on myexcluster.in.the.cloud and assumes
a 50GB available disk space. There is authentication required to access the
cluster statistics so here the login happens with user "read" and
password "only".
------------------------
Icinga 2 service check examples
# Check Elasticsearch Disk Usage
object Service "ElasticSearch Disk Usage" {
import "generic-service"
host_name = "myesnode"
check_command = "check_es_system"
vars.es_address = "myescluster.in.the.cloud"
vars.es_user = "read"
vars.es_password = "only"
vars.es_checktype = "disk"
vars.es_available = "50"
}
In this example, the disk check happens on myexcluster.in.the.cloud and assumes
a 50GB available disk space. There is authentication required to access the
cluster statistics so here the login happens with user "read" and
password "only".
# Check Elasticsearch Status
object Service "ElasticSearch Status" {
import "generic-service"
host_name = "myesnode"
check_command = "check_es_system"
vars.es_address = "myescluster.in.the.cloud"
vars.es_user = "read"
vars.es_password = "only"
vars.es_checktype = "status"
}
In this example, the status of Elasticsearch running on myexcluster.in.the.cloud is checked.
There is authentication required to access the cluster statistics so here the login happens with user "read" and
password "only".
------------------------
Screenshots


|