Using ansible to deploy a full HA k3s cluster
About this project
I am an avid follower of TechnoTim and recently he posted his ansible k3s playbook to deploy an HA k3s cluster with etcd. This didn’t include automating the deployment of Traefik as well as Rancher to better suit those who just wanted k3s. I wanted to expand on that for my own use and decided to share it as well. I have the repo on my github.
Requirements
You need the following items already in place
- A linux desktop/vm/or WSL on windows where you can clone the repo. This will need ansible and git to accomplish this
- Various linux server VM’s or bare metal servers to be “nodes” (raspi’s work too but I don’t suggest SD cards for storage)
- For a cluster you need at least 3 nodes to maintain a leader if one node goes down
- You can have as many worker or agent nodes as you think you will need
- Nodes ideally should be the same linux distro (Debian, Ubuntu, or CentOS)
- Same user account/password on all of them (better with only SSH key-based auth) and you can SSH into them all
- User needs SUDO access
- List of the node’s IP addresses
- 1 IP address in your local LAN but not part of your DHCP scope to use as the k3s API endpoint
- a range of IP’s also in your local net not included in DHCP range for metalLB. Doesn’t have to be a lot just a few will do.
Steps to get going
- Open the console or SSH into your host running Ansible
- Clone the github repo & cd into it
1
2
git clone https://github.com/ChrisThePCGeek/k3s-ansible-traefik-rancher
cd k3s-ansible-traefik-rancher
Make a copy of the sample inventory and vars files
1
cp -R inventory/sample inventory/my-cluster
Edit the hosts.ini filling in your IP addresses
1
nano inventory/my-cluster/hosts.ini
List the IP addresses, one per line, under each category. Master nodes will run etcd and control plane as well as workloads (unless you apply any taints) whereas agent nodes will just run workloads aka containers
1
2
3
4
5
6
7
8
9
[master]
192.168.15.1
[node]
192.168.15.10
[k3s_cluster:children]
master
node
Remember unless you are just running a single node cluster with only a solitary master you need at least 3 master nodes. Agent’s cant join to a single master based cluster
Next edit the “all.yml” file under inventory/my-cluster/group_vars. Be sure to pay attention to the comments, they explain the different options
This playbook will also deploy the latest stable version of Traefik and Rancher. Make sure you have added the necessary entries in your local DNS or OS’s hosts file. If you want to customize traefik the template files are in roles/traefik/templates
With all that set up, you can start to run the playbook. I have simplified it into a bash script (./deploy.sh)
After the playbook has finished and no errors in a few minutes or so you should be able to get to Traefik’s dashboard using the DNS name you picked and put into the all.yml file. Same goes for Rancher once it has completed spinning up.
Copy the kubeconfig from the master to your local system if you need to run kubectl commands in CLI.
1
2
mkdir ~/.kube
scp serveradmin@master-ip:~/.kube/config ~/.kube/config
In case of errors
Reset the nodes by tearing it all down, troubleshoot, then try running deploy again
1
./kill-all.sh
Additional thoughts
This repo, followed as it is, doesn’t have pre-configured support for running lets encrypt certs from traefik. That needs additional configuration. I tailored this to work for my homelab where I don’t have services in my cluster directly exposed to the web. I currently just use my own local CA and a root cert trusted by each of my systems which needs some manual setup.