Project Repo: https://github.com/christensenjairus/ClusterCreator
When you request a Kubernetes cluster from your cloud provider (AWS, GCP, Azure, Linode, etc), the provider quickly performs some steps to provision virtual machines and them bootstrap them into a K8s cluster. In January of 2024, I was searching for an open-source project that I could use to provision and bootstrap Kubernetes clusters on Proxmox infrastructure, much like a cloud provider does. Surprisingly, I didn’t find anything on GitHub that could easily provision and then bootstrap a K8s cluster for me!
So I built my own.
The final result is incredibly useful for my environment. I can create K8s clusters from scratch in minutes with as little as two commands! As a user, it’s almost just as easy as requesting a K8s cluster from a cloud provider, but all on Proxmox!
Setup
View the README to know which files to edit before creating a cluster. Among other things, you must
- Configure a user and token for Terraform so it can interact with the Proxmox API. (
secrets.tf
) - Configure a user and password for Terraform so it can interact with the (optional) Unifi API. (
secrets.tf
) - Set your SSH Key (
secrets.tf
) - Configure the Proxmox IP address (
vars.tf
andk8s.env
) - Configure the username and password of your VMs (
.env
andsecrets.tf
) - Set the Proxmox datastore name (
k8s.env
) - Set the timezone (
k8s.env
) - Set the network settings for your template VM (
k8s.env
) - Adjust package versions (if necessary) (
k8s.env
) - Adjust the Proxmox node name (
vars.tf
) - Adjust the (optional) Unifi Router URL (
vars.tf
) - Adjust your k8s cluster settings, like the cluster name, vlan & networking, node class types, specs, and quantities, etc. (
clusters.tf
)
How it works
Step 1: Create a VM Template
We must first create a virtual machine template that Terraform can clone and modify to create the various Kubernetes nodes. The create_template.sh
script will…
- SSH into the Proxmox host
- Download the latest Ubuntu or Debian cloud image
- Install
qemu-guest-agent
andcloud-init
onto the image and place a few files the files (two of which are ‘firstboot’ scripts that are run by cloud-init when the VM boots for the first time) - Generate a VM with cloud-init functionality
- Attach and resize the disk
- Run the VM and wait until the firstboot scripts have finished installing packages
- Shutdown the VM
- Mark the VM as a template
./create_template.sh
Although it took me ~7.5 mins to run the script, this only needs to be performed once. You can use the same template over and over. Just run the script again when your Ubuntu or Debian version is out of date.
Step 2: Provision Virtual Machines
Now that we have a template we can clone, let’s use Terraform (or OpenTofu) to provision the VMs, the Pool, and the optional Unifi VLAN.
Create a workspace for the cluster you define in clusters.tf
. Then apply the configuration. In this case, the cluster name is “beta”.
tofu workspace new beta
tofu apply
View the output here to ensure it is correct, then type yes
to provision the cluster VMs, Pool, and VLAN.
If any resources aren’t provisioned correctly, reapplying should reprovision the failed resources.
There’s another resource that is provisioned by Terraform that is worth mentioning. There’s a file called cluster_config.json
in the ansible/tmp/beta
folder that contains all the info from your clusters.tf
file concerning the beta
cluster. This is used by Ansible to determine variables for the cluster, including the IPs it should use for the ansible-hosts file.
Step 3: Bootstrap the Cluster with Ansible
Your VMs, pool, and optional VLAN should now be configured. Ensure that your VMs are reachable over the network from the computer you’ll run the bootstrapping on – in my case, my laptop. You must specify the --cluster-name
/ -n
.
./install_k8s.sh --cluster-name beta
The bootstrapping process dynamically creates the ansible-hosts file so that it is up to date with what Terraform has created. Many things are taken into account during the process. The dynamic nature of the Ansible tasks allows for more complex setups, like…
- External ETCD cluster
- Infinite number of each node class, including apiservers and etcd servers
- Custom node classes for workers with custom labels
- Dual-stack configurations
- A highly available kube-vip apiserver endpoint to load balance between the apiserver nodes
Cluster Maintenance
Add a node
If you need to add a node, update the node class count in clusters.tf
, and apply the Terraform configuration.
tofu apply
Once the new VM is created, run the same script as earlier but with the --add-nodes
or -a
flag. This runs a subset of the initial bootstrapping process, with some playbooks to generate join commands and join the new (or missing) nodes.
Cordon or Remove Node
You can also drain or altogether remove a node from the cluster. Similar to the bootstrapping script, you must provide the --cluster-name
/ -n
, but also the --hostname
/ -h
of the node you’d like to drain.
If you want to drain + remove the node from the cluster and reset everything on the node to forget the k8s configurations, add the --delete
/ -d
flag.
./remove_node.sh --cluster-name beta --hostname "beta-general-1"
I’ll quickly drain the node that was added earlier. Then run the script again adding the -d
flag.
NOTE: I changed the flag after recording this video. -c
should be -n
.
Change Cluster Power State
You can use the Proxmox pool name and the powerctl script to perform various actions to the VMs in Proxmox.
- Start
- Shutdown
- Stop
- Pause
- Resume
- Hibernate
Let’s say you’re done testing something out with your beta cluster, and there’s no need for it to be running. You could suspend all the VMs like so…
./powerctl_pool.sh --pause BETA
Run Commands on Hosts
You can run commands on all the hosts at once – or only a subset of the hosts – using Ansible and the ansible-host file’s groups. You must provide the --group
/ -g
from the ansible-hosts file, the --cluster-name
/ -n
, and the --command
/ -c
to run.
To run on all hosts from a cluster, you can either omit the --group
or set it to all
.
./run_command_on_host_group.sh -g "kube_general_servers" -n beta -c "echo 'Hello World!'"
Reset the cluster (or a single node)
You may wish to uninstall k8s to troubleshoot the initial bootstrapping process. Or maybe reset a node that isn’t correctly joined to the cluster.
To target every host in the cluster, add a --cluster-name
/ -n
omitting the --single-hostname
/ -h
flag.
./uninstall_k8s.sh --cluster-name beta
You may have guessed that the remove_node
script with --delete
and --single-hostname
flags runs the uninstall_k8s script after deleting the node from the cluster.
More Info
This project is located on GitHub here: https://github.com/christensenjairus/ClusterCreator
Pull requests and issue tickets are welcome! Let’s keep K8s a viable option for K8s-at-home enthusiasts (like me!).