A new version of check_es_system, an open source monitoring plugin to monitor Elasticsearch nodes and clusters, is available!
Version 1.11.1 is a bugfix release and fixes two important bugs which went under the radar.
The first bug was reported in #38 and involves improper authentication handling. Without going into too much details, the check types readonly, master and tps were not properly checking when the authentication was invalid (typically a 401 unauthenticated error).
The other check types used the internal function getstatus which made sure that authentication was correct.
To solve this, all check types now use the internal function getstatus.
Thanks to Dan Johansson for reporting this and even providing a code fix!
This bug has facepalm potential. When the Elasticsearch address (given by -H parameter) is an "archived" Elasticsearch instance at Elastic.co, the domain still responds to requests with a HTTP 200 status, however with a response of {"ok":false,"message":"Unknown resource."}.
The plugin interpreted this as everything is good and returned an OK return code without any output instead of failing.
$ ./check_es_system.sh -H outdated-cluster.cloud.es.io -P 9243 -S -u user -p pass -t status
$ echo $?
0
Something similar also happened when just a normal domain was used where no Elasticsearch at all is running; the plugin just returned OK without any output (although the output contained parse errors from the json parser):
$ ./check_es_system.sh -H www.claudiokuenzler.com -P 443 -S -u user -p secret -t status
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
parse error: Invalid numeric literal at line 1, column 10
$ echo $?
0
Both bugs are now fixed in version 1.11.1.
$ ./check_es_system.sh -H www.claudiokuenzler.com -P 443 -S -u user -p secret -t status
ES SYSTEM CRITICAL - Elasticsearch not available at this address www.claudiokuenzler.com:443
No comments yet.
AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Database Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Office PGSQL PHP Perl Personal PostgreSQL Postgres PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder