New version of check_couchdb_replication: Rewrite and improvements

Written by - 0 comments

Published on - Listed in Monitoring CouchDB Databases


It has been very quiet around check_couchdb_replication, a monitoring plugin to monitor CouchDB (synchronous) replications, in the past years.

But now there's a new update on the plugin and a new version is available! A couple of recent changes made it into the latest release (20220223) which include:

  • Correctly handling CouchDB 3 output (handle additional "missing" output)
  • Rewriting the plugin to better work with the JSON output, replacing jshon by jq
  • Added performance data when checking all replications (-r ALL)
  • Handle non-continuous (= one time) replications which are now by default excluded from the replication check. If the check should include one time replications, use the new -i parameter.
  • Improve the output of all detected replications (using -d) by showing the replication source alongside the doc_id (helps to find a replication more quickly)
  • Improve the HTTP requests and response handling by using a dedicated function
Below are some of the changes in more detail.

Handle CouchDB 3 output

CouchDB 3 added additional information to the JSON output of a replication. One of this additional information is the "missing_revisions_found" counter. Because the plugin was "used to CouchDB 2" in 2018, the "missing" string was interpreted as replication not found. This resulted in the following error, even for existing replications:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION CRITICAL - Replication for e9046b127bd99afc9cd208b94d6ea136 not found

The new plugin release fixes this:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r e9046b127bd99afc9cd208b94d6ea136
COUCHDB REPLICATION OK - Replication e9046b127bd99afc9cd208b94d6ea136 is running

Thanks to Guillaume Subiron for creating the PR. In the newest version a slightly different approach ("error: not_found" instead of "reason: missing" ) is now used.

Better replication detection

In the previous version from 2018, the replication detection only showed the doc_id of the replication:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: "e9046b127bd99afc9cd208b94d1ca1b6" "e9046b127bd99afc9cd208b94d6ea136" "e9046b127bd99afc9cd208b94d1cc8e2" "e9046b127bd99afc9cd208b94d1c081a" "e9046b127bd99afc9cd208b94d6e9b59"

This information is enough to run a check on a single replication - but it does not show to the human eye what replication is behind the doc_id.

In the latest release the output now also shows the replication source (which includes the database) in the parentheses following the doc_id:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -d
COUCHDB AVAILABLE REPLICATIONS: e9046b127bd99afc9cd208b94d1ca1b6 (http://couchdb1.example.com:5984/_users/) e9046b127bd99afc9cd208b94d6ea136 (http://couchdb1.example.com:5984/marketingtools/) e9046b127bd99afc9cd208b94d1cc8e2 (http://couchdb1.example.com:5984/councillors-ch-dev/) e9046b127bd99afc9cd208b94d1c081a (http://localhost:5984/q-items-dev/) e9046b127bd99afc9cd208b94d6e9b59 (http://localhost:5984/marketingtools/)

Performance data on -r ALL

Although this is just a minor change on the output itself, it is a helpful new feature for users creating graphs from performance data. This can show over a long period the number of replications.

The old version did not show the number of replications:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All replications running

The new version shows the number of successfully running replications in the output and in performance data:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;

Exclude one time replications by default

Another problem, which was described in issue #5, was that the plugin returned a CRITICAL alert on non-continuous (one time) replications. Of course these kind of replications only run once and then are marked as "completed" in their state - yet the plugin expected a "running" state:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION CRITICAL - 2 replications not running ("doc_id":"e9046b127bd99afc9cd208b94d5e4fcb" "error_count":0 "info":{"revisions_checked":899,"doc_id":"e9046b127bd99afc9cd208b94d5f4538" "error_count":0 "info":{"revisions_checked":899,)

To handle this properly, all the non-continuous (= one time) replications are now excluded from the "ALL" check by default:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL
COUCHDB REPLICATION OK - All 38 continuous replications running | replok=38;;;; replfail=0;;;;

If the one time replications should be part of the check, they can be included again using the new -i parameter:

$ /usr/lib/nagios/plugins/check_couchdb_replication.sh -H couchdb -u user -p secret -r ALL -i
COUCHDB REPLICATION CRITICAL: 2 continuous replications not running - Details: e9046b127bd99afc9cd208b94d5e4fcb (state: completed, error count: 0)  e9046b127bd99afc9cd208b94d5f4538 (state: completed, error count: 0)  | replok=38;;;; replfail=2;;;;



Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.