Sunday, June 11, 2023

Configuring Calico CNI with VPP Dataplane for Kubernetes

This is a quick run-down of how to configure Calico for pod networking in a Kubernetes cluster. Calico comes in several flavors, and we look at Calico with the VPP data plane, as opposed to classic Calico. One reason for looking at this option is to be able to use encryption at the L2 layer using IPsec, which is supported by VPP but not classic Calico.

Installing Kubernetes

This article doesn't cover how to install Kubernetes - there are several guides for doing that, including this one. Once you have installed Kubernetes on your cluster nodes, and the nodes have all joined the cluster, it is time to install the CNI plugin to allow pods to communicate across nodes. However, there are a few things that need to be ensured even while configuring Kubernetes, before we actually get to installing the CNI plugin.

Calico by default uses the subnet 192.168.0.0/16 for the pod network. It's good to use this default if you can, but if you cannot, choose the subnet you want to use. Then set Kubernetes up with the alternative CIDR you have in mind. If you use kubeadm to set up Kubernetes, then use the --pod-network-cidr command-line option to specify this CIDR. Here is an example command-line to do this on the (first) master node:

    kubeadm init --apiserver-advertise-address=10.20.0.7 --control-plane-endpoint=10.20.0.7:6443 --pod-network-cidr=172.31.0.0/16

The output of this command would contain the kubeadm command-line to run on the other nodes to register them with the master. At this point, running kubectl get nodes would list the cluster nodes but they would be shown in the NotReady state. To change that, we would need to install Calico.

Installing Calico

The page here already summarizes the process of installing Calico with VPP quite well, but there are a few things that need to be called out.

Hugepages and vfio-pci

This step is required only if you want to choose a specific VPP driver that would be used to drive the physical interface, namely virtio, dpdk, rdma, vmxnet3 (VMware), or avf (certain Intel drivers). If this is not explicitly chosen but left to the default, even then these settings can improve performance, but the memory requirements would typically be higher per node.

On each cluster node, create a file called /etc/sysctl.d/calico.conf and add the following content.

vm_nr.hugepages = 512

Then run:

    sudo sysctl -p

Similarly, on each cluster node create a file call /etc/modules-load.d/calico-vfio-pci.conf, and put the following content line it.

vfio-pci

On CentOS / RedHat, this should be vfio_pci instead. Then run:

    modprobe vfio-pci   # or vfio_pci on CentOS / RedHat

Finally, reboot the node.

Customizing the installation config

Create the Tigera operator:

    kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.24.6/manifests/tigera-operator.yaml

Download the installation-default.yaml file and modify it as suggested below:

    wget https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/calico/installation-default.yaml

In this file, there would be two objects listed. An Installation object, and an APIServer object. We would only edit the manifest of the first. Under the spec.calicoNetwork sub-object of Installation, add the ipPools attribute as shown below:

spec:
  calicoNetwork:
    linuxDataplane: VPP
    ipPools:
    - cidr: 172.31.0.0/16    # or whatever you chose for your pod network CIDR
      encapsulation: VXLAN   # or IPIP
      natOutgoing: Enabled

While not commonly required, there is an option to override the image prefix used, in order to download images from non-default image registries / mirrors. This came in handy for me because I could use a corporate mirror instead of the default docker.io which had strict rate limits imposed. Use it thus:

spec:
  imagePrefix: some-prefix-ending-in-fwd-slash

Then apply this edited manifest:

    kubectl create -f installation-default-edited.yaml

This would create a number of pods, including the calico API controller, calico node daemonset pods, calico typha daemonset pods, etc. The calico node daemonset pods would not be up though till the VPP dataplane is installed.

Installing VPP

To install VPP, you got to use one of two manifests, depending on whether you configured hugepages *(use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp.yaml) or not (use https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/generated/calico-vpp-nohuge.yaml). Download the appropriate YAML, and then make the following edit if needed.

The vpp_dataplane_interface attribute should be set to the name of the NIC that would be used or the node-to-node communication. By default it is set to eth1, but if that's not the interface that would be used on your node (e.g. on my cluster, I am using eth0), then set this appropriately.

Then apply:

    kubectl apply -f calico-vpp.yaml   # or calico-vpp-nohuge.yaml

This would install the calico-vpp-dataplane daemonset on the cluster nodes. If all went well, then all the pods related to calico, and the core-dns pods should be up and running in a few minutes.

Enabling IPsec

For this, the instructions here are already adequate. You need to create a secret and put a pre-shared key in it:

    kubectl -n calico-vpp-dataplane create secret generic \
    calicovpp-ipsec-secret \
    --from-literal=psk="$(dd if=/dev/urandom bs=1 count=36 2>/dev/null | base64)"

Then patch the calico-vpp-node daemonset with the ipsec configuration:

    kubectl -n calico-vpp-dataplane patch daemonset calico-vpp-node \
    --patch "$(curl https://raw.githubusercontent.com/projectcalico/vpp-dataplane/v3.24.0/yaml/components/ipsec/ipsec.yaml)"


Packet Tracing and Troubleshooting

WIP, but this is where the docs are sketchy.

No comments: