Reset (clean up) a Rancher Docker Host / Kubernetes Node

Written by - 0 comments

Published on - last updated on July 10th 2023 - Listed in Linux Virtualization Containers Rancher Cloud Kubernetes


These days I'm testing Rancher as a potential candidate for a new Docker infrastructure. It's appealing so far: Rancher does have a nice and intuitive user interface and more importantly a nice API to automatically trigger container creation (for example through Travis).

During a fail over test, I rebooted one of the Rancher hosts and when it came back up, the connectivity to Rancher was lost. Why? Because I forgot to add the separate file system for /var/lib/docker, which I prepared as a logical volume, into /etc/fstab - therefore all previous docker data was gone and of course also the rancher-agent container.

Unfortunately I didn't see the error as fast and I just decided to simply remove the host in Rancher and re-add it manually. Of course when I fixed the file system mount problem and rebooted, Rancher would not connect anymore, because meanwhile there is a new rancher-agent with a new ID installed.

Clean up a Rancher 1.x host

To force a reset or cleanup of the Rancher host, one can do the following:

1. Deactivate the affected host in Rancher, then remove the host

2. Stop Docker service

service docker stop

3. Remove Docker and Rancher data:

rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*

4. Start Docker service

service docker start

5. Add the host in Rancher

Clean up a Rancher 2.x Kubernetes node (2.0 - 2.5)

The above commands apply to a Rancher 1.x environment. In Rancher 2.x more directories must be cleaned up:

1. Deactivate (drain) the affected host in Rancher, then remove the host. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop Docker service 

service docker stop

3. Remove Docker, Rancher, RKE and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke

4. Restart Docker service

service docker restart

Yes, although the Docker service was previously stopped, a simple "start" does not re-create the directories within /var/lib/docker (since Docker 20.10.x; see article Docker unable to pull images after-clean up for more information):

root@node:~# service docker start
root@node:~# ll /var/lib/docker/
total 0

A service restart however re-creates the missing directories:

root@node:~# service docker restart
root@node:~# ll /var/lib/docker/
total 44
drwx--x--x 4 root root 4096 Nov 11 14:06 buildkit
drwx--x--- 2 root root 4096 Nov 11 14:06 containers
drwx------ 3 root root 4096 Nov 11 14:06 image
drwxr-x--- 3 root root 4096 Nov 11 14:06 network
drwx--x--- 3 root root 4096 Nov 11 14:06 overlay2
drwx------ 4 root root 4096 Nov 11 14:06 plugins
drwx------ 2 root root 4096 Nov 11 14:06 runtimes
drwx------ 2 root root 4096 Nov 11 14:06 swarm
drwx------ 2 root root 4096 Nov 11 14:06 tmp
drwx------ 2 root root 4096 Nov 11 14:06 trust
drwx-----x 2 root root 4096 Nov 11 14:06 volumes

5. Add the host into a cluster using the sudo docker... command (shown in Rancher UI) or in RKE YAML

Clean up a Rancher 2.7 Kubernetes node

[... in progress, to be verified ... ]

Kubernetes nodes in Rancher managed downstream clusters run containers with their own deployment of containerd. The binaries are located in /var/lib/rancher/rke2/bin. These are not installed through the system package repositories.

To reset a Rancher 2.7 downstream cluster node, use the following steps.

1. Deactivate (drain) the affected host in Rancher, then delete the node. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop RKE2 and Rancher-System service, delete related Systemd service units

systemctl stop rke2-server.service
systemctl stop rancher-system-agent.service
rm -f /etc/systemd/system/rancher-system*
rm -f /usr/local/lib/systemd/system/rke2-server.service
systemctl daemon-reload

This should (hopefully) stop all the containers (TO BE VERIFIED).

3. Remove Rancher, RKE and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
test -d /var/lib/docker && rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke

4. Reboot

reboot

Reboot the node and verify no containerd-shim-runc-v2 processes are running.

Looking for a managed dedicated Kubernetes environment?

If you are looking for a managed and dedicated Kubernetes environment, managed by Rancher 2, with server location Switzerland, check out our Private Kubernetes Container Cloud Infrastructure service.


Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.