Major update on Elasticsearch monitoring plugin check_es_system!

Written by - 0 comments

Published on - Listed in Monitoring Elasticsearch Icinga Nagios


It's been quite a while since the last update on the monitoring plugin check_es_system, a plugin to monitor Elasticsearch nodes.

That's why this post tries to describe the latest changes more detailed.

Let's start with a change that got under my radar: Tom Barton (@deric) already created a pull request quite a while ago in March 2018. I had my Github notifications on off so I never saw that one - sorry!
He added a helpful new parameter "-m" which stands for "max time" (aka timeout). This allows to have an additional verification that Elasticsearch responds fast enough. This change is shown as 20180313 in the plugin's change history.

Yesterday I came across a strange bug but when I made a configuration error in the Icinga2 service definition. This lead to open issue #4 which was then solved in version 20190219. Basically this bug hits you when the plugin tries to access Elasticsearch on a https port but you didn't select the -S parameter. In the background this launches curl to talk http on a https listener port. Got the idea?

And also yesterday I started to work on a new check type: status. Yes, pretty standard, I know. I actually never added a status check in the first place because I was successfully using a different plugin (check_elasticsearch.sh by Andrew Lyon) for the status checks. But in the recent few weeks we increased our Elasticsearch fleet (internal and in the cloud) and this led to many different credentials, ports, ES with and without HTTPS, etc. So I needed a plugin which can be as dynamic as our environment.

The new "status" check type does not only output green, yellow or red. No, it also adds some helpful information about the cluster structure. How many nodes are there, how many data nodes? How many shards are there? And in case i'm in yellow or red state, are there shards to be relocated/initialize/assign? And finally something which is not often though of: Number of documents. This seems irrelevant for a status check but when you create graphs with the numbers, you can see the growth rate of your Elasticsearch cluster.

To round this up, all this is now released as version 1.1, which makes it a bit easier to remember as the history dates as release numbers.

The documentation page has been updated and greatly enhanced with additional examples.

Enjoy!


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.