check_rancher2 1.12.0 released: Monitoring of Rancher internal certificates in the local cluster

Written by - 0 comments

Published on - Listed in Kubernetes Rancher Internet Cloud Monitoring


A new version of check_rancher2, a monitoring plugin for Kubernetes clusters managed by SUSE Rancher, is now available! Version 1.12.0 introduces a new check type "local-certs" to monitor the (internal) certificates used and deployed by Rancher in the "local" cluster.

Rancher internal certificates

When installed, Rancher deploys certificates into the "local" (Rancher management) cluster under the "System" project. More precisely, these certificates are stored as Kubernetes secrets in the "cattle-system" namespace and can be seen using kubectl:

$ kubectl -n cattle-system get secret  | grep tls
cattle-webhook-ca                       kubernetes.io/tls                     2      464d
cattle-webhook-tls                      kubernetes.io/tls                     2      3h49m
serving-cert                            kubernetes.io/tls                     2      3h41m
tls-rancher                             kubernetes.io/tls                     2      4y88d
tls-rancher-internal                    kubernetes.io/tls                     2      161m
tls-rancher-internal-ca                 kubernetes.io/tls                     2      464d

After installation of Rancher, these certificates are created with a one year validity (except the ca certificates have an expiry date farther in the future). These certificates are usually only renewed when Rancher is updated. If the certificate(s) expire, your Rancher cluster will likely run into a problem. Because these certificates are only used internally by Rancher (compared to the Kubernetes certificates), you won't recognize a problem immediately. Only by doing some specific management tasks (such as changing RBAC/Users) you will notice problems.

On the Rancher 2 classic UI (Cluster Manager) you can see these certificates in the "local" cluster, under the "System" project. Under "Resources" select "Secrets", then change to the tab "Certificates".

Expired Rancher internal certificates

In the newer Cluster Explorer UI select "Secrets" in the left navigation, then in the namespace selector (at the top) select "cattle-system" and sort the list by the "Kind" tab. The certificates should show as "TLS Certificate".

Newer Rancher releases have added a fix to automatically renew these internal certificates when a certificate is within 30 days (or fewer) of expiration date.

The Rancher 2.5 documentation says:

In Rancher v2.5.12 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date

Similar for Rancher 2.6:

In Rancher v2.6.3 and up, rancher-webhook deployments will automatically renew their TLS certificate when it is within 30 or fewer days of its expiration date.

In the documentation only the rancher-webhook certificate is mentioned. The other certificates (serving-cert, tls-rancher and tls-rancher-internal) are unfortunately not documented - yet they will expire and may cause problems, too.

Monitoring Rancher internal certificates

As mentioned, check_rancher2 version 1.12.0 now allows to monitor these internal certificates by using the "local-certs" check type:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs
CHECK_RANCHER2 CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago - tls-rancher-internal expired 98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;

By default the plugin only checks for already expired certificates. To be alarmed before certificates expire, add the --cert-warn parameter with the number of days in advance (here 14 days):

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14
CHECK_RANCHER2 CRITICAL - 3 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago - tls-rancher-internal expired 98 days ago -)|'total_certs'=6;;;; 'expired_certs'=3;;;; 'warning_certs'=0;;;; 'ignored_certs'=0;;;;

The plugin also allows to ignore one or more certificates from the check:

$ ./check_rancher2.sh -H rancher2.example.com -U token-xxxxx -P "secret" -S -t local-certs --cert-warn 14 -i tls-rancher-internal
CHECK_RANCHER2 CRITICAL - 2 certificate(s) expired (cattle-webhook-tls expired 98 days ago - serving-cert expired 82 days ago -) - 1 certificate(s) ignored: tls-rancher-internal|'total_certs'=6;;;; 'expired_certs'=2;;;; 'warning_certs'=0;;;; 'ignored_certs'=1;;;;

This new monitoring check should help to fix a number of Rancher Kubernetes clusters before they run into problems. Expired Kubernetes or Rancher internal certificates is one of the most widely reported issues.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.