Setting up a Kubernetes Cluster

By Paulus, 19 January, 2023

A while ago I wanted to learn how to set up my own Kubetnetes cluster. I quickly found that there were wasn’t a single “right” way to do it. I was looking for a guide that would walk me through the steps of standing up a generic cluster with the most commonly used components. However, each guide I found was always different and tended to rely on tools to bootstrap the cluster and left it there.

The goal of this post is to manually set up a high availability k8 cluster, which will give you a starting point to add other things such as persistent storage. In addition to guiding you through the process I will include additional technical details beyond the typical “run x, y, z commands, now you have a cluster!” format. This post will not cover anything else beyond setting up a cluster, basic troubleshooting, and technical information. There are a lot of projects you can add and ways to configure the cluster. Creating a simple cluster and building on that will help me and you understand Kubernetes better.

Before I get into this, I want to address the difference between k8s and k3s. Most of the tutorials and guides I have found online use k3s and don't explain why. The short of it is that k3s is smaller, requires less resources, can run on a lot more hardware, and is faster. K3s only requires 1 CPU and 512MB of RAM because it does not have extra storage providers, alpha features, and legacy components.

The advantages of going with k3s over k8s are:

It’s lightweight, requiring less resources while still providing 100% compatibility and functionality
There’s less to learn because the excess stuff has been removed
It can work with k8s clusters
Faster and easier to deploy
Can run on the ARM architecture
Support a single node
Flexible
Easily turn it off and on

There isn’t much of a disadvantage to using k3s:

Does not come with a distributed database, which limits the control plane’s high availability
Does not support any other database engine other than etcd
Will need to use an external database for high availability

So what’s the difference between k3s and k8s?

k3s is faster and lighter. K8s runs components in separate processes, while all the components are in the single 40-100MB k3s binary
Cannot switch off embedded components in k8s – to make it lightweight

When should you use k3s?

Need something lightweight
Only have a single node cluster running locally or on edge devices
Need to support or have multiple CPU architectures
On premises cluster, and have no need for cloud provider extensions
Need to spin up jobs for cloud bursting, CI testing, etc.
Need to frequently scale

There isn’t much of a reason why you would choose k8s over k3s. If you need the extra features including alpha, and providers that require high availability where data is spread out between multiple clusters and cloud providers, then go with k8s.

Set up

There are several ways of setting up a Kubernetes cluster, but we will be doing this manually so there is a deeper understanding about how everything comes together. The method of bootstrapping a cluster is to use the kubeadm toolbox. In order to make things easier, we’ll be doing this using virtual machines. The following is required for installing k8s on either a virtual machine or a physical machine:

Compatible Linux host
2 or more GB RAM, the more the better in a production environment
2 or more CPUs
Network connectivity between all machines in the cluster
- If the system has multiple network cards, make sure routes are set up properly.
- Port 6443 is open
Unique host name, MAC address, and product_uuid for every node
- cat sysclass/dmi/id/product_uuid
Container runtime
Kubernetes tools:
- kubeadm: toolbox for bootstrapping a cluster
- kubelet: component that runs on all machines in the cluster and that manages pods and containers.
- kubectl: Utility to manage the cluster
cgroup driver

For our virtual machine cluster we need the following:

Enough system resources for 3 virtual machines (2 CPUs and 2GB of RAM).
Vagrant
VirtualBox or Libvirt
Ansible

Container Runtime Interface (CRI)

A container runtime is required to be installed on each node for Kubernetes to work properly. This is also known as a container engine and is the software that runs the containers on the host operating system. Runtimes or engines are responsible for loading container images from a repository, monitoring resources, resource isolation, and managing the life cycle of the containers.

There are three type sof container runtimes:

Low-level: All engines implementing the Open Container Interface, which is a standard way of how runtimes are implemented, are considered low-level container runtimes. Low-level container runtimes are an abstraction layer that only provide the facilities to to create and run containers. The de-facto standard low-level container runtime is runC, developed by Docker and the OCI Linux Foundation Project. crun is a version developed by Redhat to be lightweight and fast. Lastly, containerd is technically a low-level but offers an API abstraction layer.
High-Level Container Runtimes include Docker's containerd, which contains features for managing containers outside of creating and running them. containerd is the leading system and offers free and paid options. CRI-O is an open source lightweight alternative to containerd.
Windows Containers & Hyper-V Containers can be thought of Microsoft Docker. Windows Containers uses the kernel process and namespace isolation to create the environment for each container. Hyper-V container are more secure because a VM is created for containers to be deployed into. The VM can have different operating systems allowing for greater flexibility.

In version 1.20, it was announced that direct integration with Docker Engine would be removed in later version. In version 1.24 it was finally removed. If you're running RH or Debian based distribution, installing the containerd CRI is probably the easiest.

Cgroup Driver

The systemd driver is recommended for kubeadm setups. As of v1.22, this is the default.

Container Network Interface

The CNI project consists of specifications and libraries for writing network plugins that configure network interfaces in Linux containers. In order to do any sort of networking with Pods you must have a plugin installed. There are a number of available network plugins and all have a different set of features. Some of these plugins work with others to allow you to create more advanced configurations. To get started visit the CNI Github project page.

In this example we're going to use Calico.

Limitations

When setting up, running, and managing various clusters, the kubeadm, kubelet, and kubectl should be the same version. However, it is possible to mix and match to a certain degree. If you arern't able to maintain consistent versions for one reason or another, it is possible to run different versions based on the skew policy.

For our setup, we won't need to worry about the limits of k8s. As a reference those limits are:

110 pods/node
5,000 nodes/cluster
150,000 pods/cluster
300,000 containers/cluster

Setup and Provisioning

First thing is we need to create our virtual machines. Below is the Vagantfile and provisioning script that will get the VMs setup. For this, I will be using Rocky Linux, CentOS's successor and RedHat EL compatible distribution.

During the provisioning, we need to add Docker's repository even though we will not be installing docker. This is because containerd is in that repository and not one by itself.

Setup the Kubernetes Nodes

The following Vagrantfile is a bit overly complex for a simple setup and sample. This format allows us to easily add different types of nodes, such as storage, in the future.


# -*- mode: ruby -*-
# vi: set ft=ruby :

ENV['VAGRANT_NO_PARELLEL']              = 'yes'
ENV['VAGRANT_BOX_UPDATE_CHECK_DISABLE'] = 'yes'
ENV['VAGRANT_DEFAULT_PROVIDER']         = 'libvirt'

VAGRANT_BOX                             = "generic/rocky9"
VAGRANT_BOX_VERSION                     = "4.2.14"
VIRTUAL_CPUS                            = 2
VIRTUAL_MEMORY                          = 2048
VIRTUAL_NETWORK                         = "172.16.16"
VIRTUAL_DOMAIN                          = "example.com"

vms = {
  "nodes" => {
    "control-plane" => {
      "01" => {
        cpus: VIRTUAL_CPUS,
        memory: VIRTUAL_MEMORY,
        ip: "#{VIRTUAL_NETWORK}.2",
      }
    },
    "worker" => {
      "01" => {
        cpus: VIRTUAL_CPUS,
        memory: VIRTUAL_MEMORY,
        ip: "#{VIRTUAL_NETWORK}.12",
      },
      "02" => {
        cpus: VIRTUAL_CPUS,
        memory: VIRTUAL_MEMORY,
        ip: "#{VIRTUAL_NETWORK}.13",
      },
      "03" => {
        cpus: VIRTUAL_CPUS,
        memory: VIRTUAL_MEMORY,
        ip: "#{VIRTUAL_NETWORK}.14",
      },
    }
	}
}

inventory_groups = {
  "control_plane" => [
    "control-plane01"
  ],
  "worker" => [
    "worker01",
    "worker02",
    "worker03"
  ]
}

Vagrant.configure("2") do |config|

  config.vm.provision "ansible" do |ansible|
    ansible.groups = inventory_groups
    ansible.playbook = "setup.yml"
    ansible.become = true
    ansible.become_user = "root"
	end

  vms.each_pair do |vm_group_name, vm_group|

    if vm_group_name == "nodes"

      vm_group.each_pair do |node_type_name, node_type_group|

        node_type_group.each_pair do |node_name, node_config|

          config.vm.define node_type_name+node_name do |node|

            # Generic, global VM configuration
            node.vm.box = VAGRANT_BOX
            node.vm.box_version = VAGRANT_BOX_VERSION
            node.vm.hostname = "#{node_type_name}#{node_name}.#{VIRTUAL_DOMAIN}"
            node.vm.network "private_network", ip: node_config[:ip]

            if node_type_name == "control-plane"

              # Generic control-plane node specific configuration

              node.vm.provider :libvirt do |provider|
                # Do something specific to libvirt
                provider.nested = true
              end

              node.vm.provider :virtualbox do |provider|
                # Do something specific to virtualbox
              end

            end

            if node_type_name == "worker"

              # Generic worker node specific configuration

              node.vm.provider :libvirt do |provider|
                
              end

              node.vm.provider :virtualbox do |provider|
                
              end
            end
          end
        end
      end
    end
  end
end

Before we get to the playbook there are two templates that we need to create first.

kubernetes.repo.j2 is a file that will be copied to all nodes so the necessary packages can be installed.

name=Kubernetes
baseurl=http://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl

kernel_modules.conf.j2 is used to ensure that the proper kernel modules are loaded each time.

{% for module in kernel_modules %}
{{ module }}
{% endfor %}

Finally we have the actual playbook.


- hosts: all
  become: yes
  become_method: sudo
  vars:
    kernel_modules:
      - br_netfilter
      - overlay
      - ip_vs
      - ip_vs_rr
      - ip_vs_wrr
      - ip_vs_sh
      - nf_conntrack
    kubernetes_version: "1.26.0"
    main_control_plane: "control-plane01"
    main_control_plane_nic: "eth0"
    pod_network_cidr: "192.168.0.0/16"
    required_packages:
      - vim
      - wget
      - curl
    timezone: "America/Chicago"
  pre_tasks:
    - name: Set API Server Advertise IP
      set_fact:
        main_control_plane_ip: "{{ hostvars[inventory_hostname]['ansible_%s' | format(item)].ipv4.address }}"
      loop: "{{ ansible_interfaces }}"
      when: inventory_hostname == main_control_plane and (main_control_plane_nic is defined and item == main_control_plane_nic)

    - name: "Add API Server Advertise to kubeadm init parameters"
      set_fact:
        kubeadm_init_params: "--apiserver-advertise-address={{ main_control_plane_ip }}"
      when: inventory_hostname == main_control_plane and main_control_plane_ip is defined

    - name: "Add Pod network CIDR to kubeadm init parameters"
      set_fact:
        kubeadm_init_params: "{{ kubeadm_init_params }} --pod-network-cidr={{ pod_network_cidr }}"
      when: inventory_hostname == main_control_plane and pod_network_cidr is defined

  tasks:
    - name: "Disable SELinux completely"
      ansible.builtin.lineinfile:
        path: "/etc/sysconfig/selinux"
        regexp: "^SELINUX=.*"
        line: "SELINUX=disabled"

    - name: "Reboot system"
      ansible.builtin.reboot:
        reboot_timeout: 120

    - name: "Set timezone"
      ansible.builtin.shell: "timedatectl set-timezone {{ timezone }}"

    - name: "Enable NTP"
      ansible.builtin.shell: "timedatectl set-ntp 1"

    - name: "Turn off SWAP"
      ansible.builtin.shell: "swapoff -a"

    - name: "Disable SWAP in fstab"
      ansible.builtin.replace:
        path: "/etc/fstab"
        regexp: '^([^#].*?\sswap\s+.*)$'
        replace: '# \1'

    - name: "Stop and Disable firewall (firewalld)"
      ansible.builtin.service:
        name: "firewalld"
        state: stopped
        enabled: no

    - name: "Load required modules"
      community.general.modprobe:
        name: "{{ item }}"
        state: present
      with_items: "{{ kernel_modules }}"

    - name: "Enable kernel modules"
      ansible.builtin.template:
        src: "kernel_modules.conf.j2"
        dest: "/etc/modules-load.d/kubernetes.conf"

    - name: "Update kernel settings"
      ansible.posix.sysctl:
        name: "{{ item.name }}"
        value: "{{ item.value }}"
        sysctl_set: yes
        state: present
        reload: yes
      ignore_errors: yes
      with_items:
        - { name: net.bridge.bridge-nf-call-ip6tables, value: 1 }
        - { name: net.bridge.bridge-nf-call-iptables, value: 1 }
        - { name: net.ipv4.ip_forward, value: 1 }

    - name: "Update system packages"
      ansible.builtin.package:
       name: "*"
       state: latest

    - name: "Install required package"
      ansible.builtin.package:
        name: "{{ item }}"
      with_items:
        - vim
        - wget
        - curl
        - gnupg

    - name: "Add Docker repository"
      get_url:
        url: "https://download.docker.com/linux/centos/docker-ce.repo"
        dest: "/etc/yum.repos.d/docer-ce.repo"

    - name: "Install containerd"
      ansible.builtin.package:
        name: ['containerd.io']
        state: present

    - name: "Create containerd directories"
      ansible.builtin.file:
        path: "/etc/containerd"
        state: directory

    - name: "Configure containerd"
      ansible.builtin.shell: "containerd config default > /etc/containerd/config.toml"

    - name: "Enable cgroup driver as systemd"
      ansible.builtin.lineinfile:
        path: "/etc/containerd/config.toml"
        regexp: 'SystemdCgroup \= false'
        line: 'SystemdCgroup = true'

    - name: "Start and enable containerd service"
      ansible.builtin.systemd:
        name: "containerd"
        state: restarted
        enabled: yes
        daemon_reload: yes

    - name: "Add kubernetes repository"
      ansible.builtin.template:
        src: "kubernetes.repo.j2"
        dest: "/etc/yum.repos.d/kubernetes.repo"

    - name: "Install Kubernetes packages"
      ansible.builtin.yum:
        name: "{{ item }}-{{ kubernetes_version }}"
        disable_excludes: kubernetes
      with_items: ['kubelet', 'kubeadm', 'kubectl']

    - name: "Enable kubelet service"
      ansible.builtin.service:
        name: kubelet
        enabled: yes

    # Control Plane Tasks
    - name: Pull required containers
      ansible.builtin.shell: "kubeadm config images pull >/dev/null 2>&1"
      when: ansible_hostname == main_control_plane

    - name: Initialize Kubernetes Cluster
      ansible.builtin.shell: "kubeadm init {{ kubeadm_init_params }} >> /root/kubeinit.log 2> /dev/null"
      when: ansible_hostname == main_control_plane

    - name: Deploy Calico network
      ansible.builtin.shell: "kubectl --kubeconfig=/etc/kubernetes/admin.conf create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml >/dev/null 2>&1"
      ignore_errors: yes
      when: ansible_hostname == main_control_plane

    - name: Install Calico by creating necessary custom resources
      ansible.builtin.shell: "kubectl --kubeconfig=/etc/kubernetes/admin.conf create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml >/dev/null 2>&1"
      ignore_errors: yes
      when: ansible_hostname == main_control_plane

    - name: Generate and save cluster join command
      ansible.builtin.shell: "kubeadm token create --print-join-command > /joincluster.sh 2>/dev/null"
      when: ansible_hostname == main_control_plane

    - name: Download join command
      ansible.builtin.fetch:
        dest: './'
        flat: yes
        src: '/joincluster.sh'
      when: ansible_hostname == main_control_plane

    - name: Download admin.conf
      ansible.builtin.fetch:
        dest: "./"
        flat: yes
        src: "/etc/kubernetes/admin.conf"
      when: ansible_hostname == main_control_plane


    # Worker Tasks
    - name: Upload join command
      ansible.builtin.copy:
        src: joincluster.sh
        dest: /joincluster.sh
        owner: root
        group: root
        mode: "0777"
      when: ansible_hostname != main_control_plane

    - name: Reset
      ansible.builtin.shell: "kubeadm reset -f"
      when: ansible_hostname != main_control_plane

    - name: Join node to cluster
      ansible.builtin.shell: "/joincluster.sh > /dev/null 2&>1"
      when: ansible_hostname != main_control_plane

Note: In many of the online tutorials that use Ubuntu, the spiserver-advertise-address was set to the private IP address, 172.16.16.12, for example. This worked fine for Ubuntu clusters, but did not work for RHEL based clusters. This may simply be a Vagrant and routing issue and not something you would run into in a real environment.

Testing it Out

Now that everything is up and running you can now play with the cluster one of two ways. The first is logging into control-plane01 and issuing kubectl commands.


vagrant ssh control-plane01
export KUBECONFIG=/etc/kubernetes/admin.conf
sudo kubectl get nodes
NAME                          STATUS   ROLES           AGE     VERSION
control-plane01.example.com   Ready    control-plane   2d20h   v1.26.3
control-plane02.example.com   Ready    control-plane   2d20h   v1.26.3
control-plane03.example.com   Ready    control-plane   2d20h   v1.26.3
worker01.example.com          Ready    worker          2d20h   v1.26.3
worker02.example.com          Ready    worker          2d20h   v1.26.3
worker03.example.com          Ready    worker          2d20h   v1.26.3
worker04.example.com          Ready    worker          2d19h   v1.26.3

The second way is running it from the host machine. When the playbook ran, it downloaded the /etc/kubernetes/admin.conf from the main control plane node and placed it into the playbook's directory.


export KUBECONFIG=$PWD/admin.conf
kubectl create deployment nginx-web --image=nginx
deployment.apps/nginx-web created
kubectl get deployments -o wide

NAME        READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES   SELECTOR
nginx-web   1/1     1            1           89s   nginx        nginx    app=nginx-web
kubectl get pods

Troubeshooting

Connection Refused

The correct config is being used. KUBECONFIG environment variable may be set to the wrong config file or $HOME/.kube/config is incorrect.
The config file has incorrect permissions chown $(id -u):$(id -g) config
Verify that the server element within the config file matches control plane; DNS, IP, and/or port.
Ensure that the firewall is turned off or the necessary ports are open. For the control plane nodes: 6443, 2379-2380, 10250, 10259, and 10257. For worker nodes: 10250 and 30000-32767
SELinux is set to PERMISSIVE or misconfigured.
Verify that the Kubernetes API server is running on the control plane nodes.

Control Plane Ports

Protocol	Direction	Port/Range	Purpose	Used By
TCP	IN	6443	Kubernetes API Server	all
TCP	IN	2379-2380	etcd server client API	kube-api-server,etc
TCP	IN	10250	Kublet API	self, control plane
TCP	IN	10259	kube-scheduler	self
TCP	IN	10257	kube-controller-manager	self

Worker Ports

Protocol	Direction	Port/Range	Purpose	Used By
TCP	IN	10250	Kubelet API	self, control plane
TCP	IN	30000-32767	NodePort Services	all

ImagePullBackOff

The ImagePullBackOff status means that the container cannot start because the image is unavailable for one reason or another. Often times it is something as simple as typo or incorrect tag.

Another thing to look at is that the CNI is configured or installed correctly. Verify that API advertise address is correct and the pod network CIDR are too.

What's Next

We just set up a kubernetes cluster and although it appears that it's a highly available cluster it isn't. We need to add more control plane nodes and a load balancer for it to be an actual HA cluster.

The next step would be to add another control plane node and a load balancer. That will be in another post.