What to do when Rancher (Kubernetes) internal Ingress controller certificates have expired

Written by - 0 comments

Published on - Listed in Docker Kubernetes Rancher Internet Cloud Monitoring

Expired certificates inside Kubernetes can be a major pain. Not only for the cluster itself (e.g. micro-services inside the cluster could stop communicating) but also for the Kubernetes administrator having to force a certificate renewal.

Kubernetes certificate renewal using rke

In Rancher managed Kubernetes (both in RKE and Single Docker installations) this problem has been known for a long time. In early 2.0 and 2.1 releases the initial Kubernetes certificates were created with a one year expiry date - leading to a broken cluster one year later. This was eventually fixed in later Rancher versions. To manually renew the certificates on HA clusters, RKE can be used on the local (management) cluster to rotate (renew) the certificates:

ck@config:~/rancher$ ./rke cert rotate --config 3-node-rancher.yml
INFO[0000] Running RKE version: v1.3.1                  
INFO[0000] Initiating Kubernetes cluster                
INFO[0000] Rotating Kubernetes cluster certificates  

Ingress certificates on Rancher local cluster expired

But not all Kubernetes services automatically reload themselves with new certificates. Today I came across an expired TLS certificate on the Rancher Ingress services, running on the Rancher management (local cluster) nodes:

root@rancher01:~# /usr/lib/nagios/plugins/check_http -I -p 443 -C 30,14
CRITICAL - Certificate 'Kubernetes Ingress Controller Fake Certificate' expired on Fri 11 Nov 2022 01:24:14 PM GMT +0000.

Luckily this didn't have an impact on the cluster nor on the end-users accessing the Rancher API or UI, as the Rancher nodes are never directly exposed to Internet and need to pass through a reverse proxy with a valid certificate (at least in my setups).

Even though the internal certificates in the background were (most likely) already automatically renewed, the still running Ingress pods still had the original certificates loaded:

ck@linux ~ $ kubectl -n ingress-nginx get pod
NAME                                    READY   STATUS    RESTARTS   AGE
default-http-backend-6977475d9b-l8dcz   1/1     Running   0          441d
nginx-ingress-controller-66pr2          1/1     Running   0          441d
nginx-ingress-controller-lkg28          1/1     Running   0          441d
nginx-ingress-controller-zzsvg          1/1     Running   0          441d

We can see all Ingress pods have been running for 441 days (without a restart). Let's remove one pod after another:

ck@linux ~ $ kubectl -n ingress-nginx delete pod nginx-ingress-controller-66pr2
pod "nginx-ingress-controller-66pr2" deleted

As "nginx-ingress-controller" is a Daemon Set which deploys one pod per node, Kubernetes should detect the "missing" pod and create a new one.

This happened a few seconds after the pod was deleted. Shortly after this, I verified the certificate on the port 443 again:

root@rancher01:~# /usr/lib/nagios/plugins/check_http -I -p 443 -C 30,14
OK - Certificate 'Kubernetes Ingress Controller Fake Certificate' will expire on Sat 27 Jan 2024 09:50:14 AM GMT +0000.

A new certificate (with a one year expiry) is now in place.

The same was done for each pod on each node and all the Ingress certificates were valid again.

Besides checking the Rancher Ingress controller certificates (running on port 443 by default, sometimes also on port 8443 on a single Docker installation) it is also worth to check the Kubernetes (API) certificates on port 6443.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.