Introducing ClusterCreator: K8s on Proxmox using Terraform and Ansible

Automate the creation of fully functional K8S clusters of any size on Proxmox

Project Repo: https://github.com/christensenjairus/ClusterCreator

When you request a Kubernetes cluster from your cloud provider (AWS, GCP, Azure, Linode, etc), the provider quickly performs some steps to provision virtual machines and them bootstrap them into a K8s cluster. In January of 2024, I was searching for an open-source project that I could use to provision and bootstrap Kubernetes clusters on Proxmox infrastructure, much like a cloud provider does. Surprisingly, I didn’t find anything on GitHub that could easily provision and then bootstrap a K8s cluster for me!

So I built my own.

The final result is incredibly useful for my environment. I can create K8s clusters from scratch in minutes with as little as two commands! As a user, it’s almost just as easy as requesting a K8s cluster from a cloud provider, but all on Proxmox!

Setup

View the README to know which files to edit before creating a cluster. Among other things, you must

  • Configure a user and token for Terraform so it can interact with the Proxmox API. (secrets.tf)
  • Configure a user and password for Terraform so it can interact with the (optional) Unifi API. (secrets.tf)
  • Set your SSH Key (secrets.tf)
  • Configure the Proxmox IP address (vars.tf and k8s.env)
  • Configure the username and password of your VMs (.env and secrets.tf)
  • Set the Proxmox datastore name (k8s.env)
  • Set the timezone (k8s.env)
  • Set the network settings for your template VM (k8s.env)
  • Adjust package versions (if necessary) (k8s.env)
  • Adjust the Proxmox node name (vars.tf)
  • Adjust the (optional) Unifi Router URL (vars.tf)
  • Adjust your k8s cluster settings, like the cluster name, vlan & networking, node class types, specs, and quantities, etc. (clusters.tf)

How it works

Step 1: Create a VM Template

We must first create a virtual machine template that Terraform can clone and modify to create the various Kubernetes nodes. The create_template.sh script will…

  1. SSH into the Proxmox host
  2. Download the latest Ubuntu or Debian cloud image
  3. Install qemu-guest-agent and cloud-init onto the image and place a few files the files (two of which are ‘firstboot’ scripts that are run by cloud-init when the VM boots for the first time)
  4. Generate a VM with cloud-init functionality
  5. Attach and resize the disk
  6. Run the VM and wait until the firstboot scripts have finished installing packages
  7. Shutdown the VM
  8. Mark the VM as a template
./create_template.sh
Create a template for Terraform to use. Video is 3x speed.

Although it took me ~7.5 mins to run the script, this only needs to be performed once. You can use the same template over and over. Just run the script again when your Ubuntu or Debian version is out of date.

Step 2: Provision Virtual Machines

Now that we have a template we can clone, let’s use Terraform (or OpenTofu) to provision the VMs, the Pool, and the optional Unifi VLAN.

Create a workspace for the cluster you define in clusters.tf. Then apply the configuration. In this case, the cluster name is “beta”.

tofu workspace new beta
tofu apply

View the output here to ensure it is correct, then type yes to provision the cluster VMs, Pool, and VLAN.

Provision the Cluster VMs with Terraform/Tofu. Video is 3x speed.

If any resources aren’t provisioned correctly, reapplying should reprovision the failed resources.

There’s another resource that is provisioned by Terraform that is worth mentioning. There’s a file called cluster_config.json in the ansible/tmp/beta folder that contains all the info from your clusters.tf file concerning the beta cluster. This is used by Ansible to determine variables for the cluster, including the IPs it should use for the ansible-hosts file.

Step 3: Bootstrap the Cluster with Ansible

Your VMs, pool, and optional VLAN should now be configured. Ensure that your VMs are reachable over the network from the computer you’ll run the bootstrapping on – in my case, my laptop. You must specify the --cluster-name / -n.

./install_k8s.sh --cluster-name beta
Bootstrap the K8s cluster with Ansible. Video is 3x speed.

The bootstrapping process dynamically creates the ansible-hosts file so that it is up to date with what Terraform has created. Many things are taken into account during the process. The dynamic nature of the Ansible tasks allows for more complex setups, like…

  • External ETCD cluster
  • Infinite number of each node class, including apiservers and etcd servers
  • Custom node classes for workers with custom labels
  • Dual-stack configurations
  • A highly available kube-vip apiserver endpoint to load balance between the apiserver nodes

Cluster Maintenance

Add a node

If you need to add a node, update the node class count in clusters.tf, and apply the Terraform configuration.

tofu apply

Once the new VM is created, run the same script as earlier but with the --add-nodes or -a flag. This runs a subset of the initial bootstrapping process, with some playbooks to generate join commands and join the new (or missing) nodes.

Add new nodes to cluster with Terraform apply and bootstrap with –add-nodes flag. Video is 3x speed.
Cordon or Remove Node

You can also drain or altogether remove a node from the cluster. Similar to the bootstrapping script, you must provide the --cluster-name / -n, but also the --hostname / -h of the node you’d like to drain.

If you want to drain + remove the node from the cluster and reset everything on the node to forget the k8s configurations, add the --delete / -d flag.

./remove_node.sh --cluster-name beta --hostname "beta-general-1"

I’ll quickly drain the node that was added earlier. Then run the script again adding the -d flag.

NOTE: I changed the flag after recording this video. -c should be -n.

Remove nodes from the cluster using the remove_node.sh script. Video is 3x speed.
Change Cluster Power State

You can use the Proxmox pool name and the powerctl script to perform various actions to the VMs in Proxmox.

  • Start
  • Shutdown
  • Stop
  • Pause
  • Resume
  • Hibernate

Let’s say you’re done testing something out with your beta cluster, and there’s no need for it to be running. You could suspend all the VMs like so…

./powerctl_pool.sh --pause BETA
Control the power state of the cluster via the Proxmox pool. Video is at 1x speed.
Run Commands on Hosts

You can run commands on all the hosts at once – or only a subset of the hosts – using Ansible and the ansible-host file’s groups. You must provide the --group / -g from the ansible-hosts file, the --cluster-name / -n, and the --command / -c to run.

To run on all hosts from a cluster, you can either omit the --group or set it to all.

./run_command_on_host_group.sh -g "kube_general_servers" -n beta -c "echo 'Hello World!'"
Run a command on either all nodes or a subset of nodes using Ansible.
Reset the cluster (or a single node)

You may wish to uninstall k8s to troubleshoot the initial bootstrapping process. Or maybe reset a node that isn’t correctly joined to the cluster.

To target every host in the cluster, add a --cluster-name / -n omitting the --single-hostname / -h flag.

./uninstall_k8s.sh --cluster-name beta

You may have guessed that the remove_node script with --delete and --single-hostname flags runs the uninstall_k8s script after deleting the node from the cluster.

Reset all nodes in a cluster to remove K8s completely. Video is 3x speed.

More Info

This project is located on GitHub here: https://github.com/christensenjairus/ClusterCreator

Pull requests and issue tickets are welcome! Let’s keep K8s a viable option for K8s-at-home enthusiasts (like me!).

Share this post
Jairus Christensen

Jairus Christensen

Articles: 19

Leave a Reply

Your email address will not be published. Required fields are marked *