New version of check_couchdb_replication: Rewrite and improvements

Written by - 0 comments

Published on - Listed in Monitoring CouchDB Databases


It has been very quiet around check_couchdb_replication, a monitoring plugin to monitor CouchDB (synchronous) replications, in the past years.

But now there's a new update on the plugin and a new version is available! A couple of recent changes made it into the latest release (20220223) which include:

  • Correctly handling CouchDB 3 output (handle additional "missing" output)
  • Rewriting the plugin to better work with the JSON output, replacing jshon by jq
  • Added performance data when checking all replications (-r ALL)
  • Handle non-continuous (= one time) replications which are now by default excluded from the replication check. If the check should include one time replications, use the new -i parameter.
  • Improve the output of all detected replications (using -d) by showing the replication source alongside the doc_id (helps to find a replication more quickly)
  • Improve the HTTP requests and response handling by using a dedicated function
Below are some of the changes in more detail.

Handle CouchDB 3 output

CouchDB 3 added additional information to the JSON output of a replication. One of this additional information is the "missing_revisions_found" counter. Because the plugin was "used to CouchDB 2" in 2018, the "missing" string was interpreted as replication not found. This resulted in the following error, even for existing replications:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION CRITICAL - Replication for e9046b127bd99afc9cd208b94d6ea136 not found

The new plugin release fixes this:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION OK - Replication e9046b127bd99afc9cd208b94d6ea136 is running

Thanks to Guillaume Subiron for creating the PR. In the newest version a slightly different approach ("error: not_found" instead of "reason: missing" ) is now used.

Better replication detection

In the previous version from 2018, the replication detection only showed the doc_id of the replication:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: "e9046b127bd99afc9cd208b94d1ca1b6" "e9046b127bd99afc9cd208b94d6ea136" "e9046b127bd99afc9cd208b94d1cc8e2" "e9046b127bd99afc9cd208b94d1c081a" "e9046b127bd99afc9cd208b94d6e9b59"

This information is enough to run a check on a single replication - but it does not show to the human eye what replication is behind the doc_id.

In the latest release the output now also shows the replication source (which includes the database) in the parentheses following the doc_id:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: e9046b127bd99afc9cd208b94d1ca1b6 (http://couchdb1.example.com:5984/_users/) e9046b127bd99afc9cd208b94d6ea136 (http://couchdb1.example.com:5984/marketingtools/) e9046b127bd99afc9cd208b94d1cc8e2 (http://couchdb1.example.com:5984/councillors-ch-dev/) e9046b127bd99afc9cd208b94d1c081a (http://localhost:5984/q-items-dev/) e9046b127bd99afc9cd208b94d6e9b59 (http://localhost:5984/marketingtools/)

Performance data on -r ALL

Although this is just a minor change on the output itself, it is a helpful new feature for users creating graphs from performance data. This can show over a long period the number of replications.

The old version did not show the number of replications:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All replications running

The new version shows the number of successfully running replications in the output and in performance data:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;

Exclude one time replications by default

Another problem, which was described in issue #5, was that the plugin returned a CRITICAL alert on non-continuous (one time) replications. Of course these kind of replications only run once and then are marked as "completed" in their state - yet the plugin expected a "running" state:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION CRITICAL - 2 replications not running ("doc_id":"e9046b127bd99afc9cd208b94d5e4fcb" "error_count":0 "info":{"revisions_checked":899,"doc_id":"e9046b127bd99afc9cd208b94d5f4538" "error_count":0 "info":{"revisions_checked":899,)

To handle this properly, all the non-continuous (= one time) replications are now excluded from the "ALL" check by default:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;

If the one time replications should be part of the check, they can be included again using the new -i parameter:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL -i
COUCHDB REPLICATION CRITICAL: 2 continuous replications not running - Details: e9046b127bd99afc9cd208b94d5e4fcb (state: completed, error count: 0)  e9046b127bd99afc9cd208b94d5f4538 (state: completed, error count: 0)  | replok=38;;;; replfail=2;;;;



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

RSS feed

Blog Tags:

  AWS   Android   Ansible   Apache   Apple   Atlassian   BSD   Backup   Bash   Bluecoat   CMS   Chef   Cloud   Coding   Consul   Containers   CouchDB   DB   DNS   Database   Databases   Docker   ELK   Elasticsearch   Filebeat   FreeBSD   Galera   Git   GlusterFS   Grafana   Graphics   HAProxy   HTML   Hacks   Hardware   Icinga   Influx   Internet   Java   KVM   Kibana   Kodi   Kubernetes   LVM   LXC   Linux   Logstash   Mac   Macintosh   Mail   MariaDB   Minio   MongoDB   Monitoring   Multimedia   MySQL   NFS   Nagios   Network   Nginx   OSSEC   OTRS   Office   PGSQL   PHP   Perl   Personal   PostgreSQL   Postgres   PowerDNS   Proxmox   Proxy   Python   Rancher   Rant   Redis   Roundcube   SSL   Samba   Seafile   Security   Shell   SmartOS   Solaris   Surveillance   Systemd   TLS   Tomcat   Ubuntu   Unix   VMWare   VMware   Varnish   Virtualization   Windows   Wireless   Wordpress   Wyse   ZFS   Zoneminder