Reset (clean up) a Rancher Docker Host / Kubernetes Node

Written by Claudio Kuenzler - 0 comments

Published on October 28th 2016 - last updated on March 4th 2025 - Listed in Linux Virtualization Containers Rancher Cloud Kubernetes

These days I'm testing Rancher as a potential candidate for a new Docker infrastructure. It's appealing so far: Rancher does have a nice and intuitive user interface and more importantly a nice API to automatically trigger container creation (for example through Travis).

During a fail over test, I rebooted one of the Rancher hosts and when it came back up, the connectivity to Rancher was lost. Why? Because I forgot to add the separate file system for /var/lib/docker, which I prepared as a logical volume, into /etc/fstab - therefore all previous docker data was gone and of course also the rancher-agent container.

Unfortunately I didn't see the error as fast and I just decided to simply remove the host in Rancher and re-add it manually. Of course when I fixed the file system mount problem and rebooted, Rancher would not connect anymore, because meanwhile there is a new rancher-agent with a new ID installed.

Clean up a Rancher 1.x host

To force a reset or cleanup of the Rancher host, one can do the following:

1. Deactivate the affected host in Rancher, then remove the host

2. Stop Docker service

service docker stop

3. Remove Docker and Rancher data:

rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*

4. Start Docker service

service docker start

5. Add the host in Rancher

Clean up a Rancher 2.x Kubernetes node (2.0 - 2.5)

The above commands apply to a Rancher 1.x environment. In Rancher 2.x more directories must be cleaned up:

1. Deactivate (drain) the affected host in Rancher, then remove the host. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop Docker service

service docker stop

3. Remove Docker, Rancher, RKE and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke

4. Restart Docker service

service docker restart

Yes, although the Docker service was previously stopped, a simple "start" does not re-create the directories within /var/lib/docker (since Docker 20.10.x; see article Docker unable to pull images after-clean up for more information):

root@node:~# service docker start
root@node:~# ll /var/lib/docker/
total 0

A service restart however re-creates the missing directories:

root@node:~# service docker restart
root@node:~# ll /var/lib/docker/
total 44
drwx--x--x 4 root root 4096 Nov 11 14:06 buildkit
drwx--x--- 2 root root 4096 Nov 11 14:06 containers
drwx------ 3 root root 4096 Nov 11 14:06 image
drwxr-x--- 3 root root 4096 Nov 11 14:06 network
drwx--x--- 3 root root 4096 Nov 11 14:06 overlay2
drwx------ 4 root root 4096 Nov 11 14:06 plugins
drwx------ 2 root root 4096 Nov 11 14:06 runtimes
drwx------ 2 root root 4096 Nov 11 14:06 swarm
drwx------ 2 root root 4096 Nov 11 14:06 tmp
drwx------ 2 root root 4096 Nov 11 14:06 trust
drwx-----x 2 root root 4096 Nov 11 14:06 volumes

5. Add the host into a cluster using the sudo docker... command (shown in Rancher UI) or in RKE YAML

Clean up a Rancher 2.7 Kubernetes node

[... in progress, to be verified ... ]

Kubernetes nodes in Rancher managed downstream clusters run containers with their own deployment of containerd. The binaries are located in /var/lib/rancher/rke2/bin. These are not installed through the system package repositories.

To reset a Rancher 2.7 downstream cluster node, use the following steps.

1. Deactivate (drain) the affected host in Rancher, then delete the node. Either in the Rancher UI or for the "local" cluster in RKE's YAML config.

2. Stop RKE2 and Rancher-System service, delete related Systemd service units

systemctl stop rke2-server.service
systemctl stop rancher-system-agent.service
rm -f /etc/systemd/system/rancher-system*
rm -f /usr/local/lib/systemd/system/rke2-server.service
systemctl daemon-reload

This should (hopefully) stop all the containers (TO BE VERIFIED).

3. Remove Rancher, RKE1, RKE2 and Kubernetes related data:

mount|grep kubelet | awk '{print $3}' | while read mount; do umount $mount; done
test -d /var/lib/docker && rm -rf /var/lib/docker/*
rm -rf /var/lib/rancher/*
rm -rf /var/lib/etcd
rm -rf /var/lib/kubelet/*
rm -rf /etc/kubernetes
rm -rf /etc/cni
rm -rf /opt/cni
rm -rf /var/lib/cni
rm -rf /var/run/calico
rm -rf /run/secrets/kubernetes.io
test -d /opt/rancher && rm -rf /opt/rancher # For Single Rancher installs
test -d /opt/containerd && rm -rf /opt/containerd
test -d /opt/rke && rm -rf /opt/rke
test -d /etc/rancher/node && rm -rf /etc/rancher/node
test -d /etc/rancher/rke2 && rm -rf /etc/rancher/rke2

4. Reboot

reboot

Reboot the node and verify no containerd-shim-runc-v2 processes are running.

Looking for a managed dedicated Kubernetes environment?

If you are looking for a managed and dedicated Kubernetes environment, managed by Rancher 2, with server location Switzerland, check out our Private Kubernetes Container Cloud Infrastructure service.

Add a comment

Show form to leave a comment

Comments (newest first)

No comments yet.

Blog Tags:

AWS Android Ansible Apache Apple Atlassian BSD Backup Bash Bluecoat CMS Chef Cloud Coding Consul Containers CouchDB DB DNS Databases Docker ELK Elasticsearch Filebeat FreeBSD Galera Git GlusterFS Grafana Graphics HAProxy HTML Hacks Hardware Icinga Influx Internet Java KVM Kibana Kodi Kubernetes LVM LXC Linux Logstash Mac Macintosh Mail MariaDB Minio MongoDB Monitoring Multimedia MySQL NFS Nagios Network Nginx OSSEC OTRS Observability Office OpenSearch PHP Perl Personal PostgreSQL PowerDNS Proxmox Proxy Python Rancher Rant Redis Roundcube SSL Samba Seafile Security Shell SmartOS Solaris Surveillance Systemd TLS Tomcat Ubuntu Unix VMWare VMware Varnish Virtualization Windows Wireless Wordpress Wyse ZFS Zoneminder Linux