Documenting a possible approach to debugging, fixing, and testing Linkerd's CNI plugin. Issue that we're dealing with is around chaining issues (#9343).
Problem statement: when a CNI plugin that runs in interface/standalone mode runs first (e.g Cilium), it will create a *.conf
file. When Linkerd's CNI plugin is installed right after, it will append itself to the existing configuration file, and a new *.conflist
file will be created.
The old *.conf
file is not cleaned-up. Config file is elected based on alphabetical order; our current installation process will not work, since the *.conflist
file will not be considered (*.conf
< *.conflist
so it gets picked first).
Cilium seems a bit weird to install in k3d. To make Cilium work, we will have to mount some ebpf specific volumes. But first, cluster creation:
$ k3d cluster create cilium-repro \
--k3s-arg "--disable-network-policy@server:*" \
--k3s-arg "--flannel-backend=none@server:*"
--disable-network-policy
: not super important here, but we are saying "don't use any NetworkPolicy" at the CNI level.--flannel-backend=none
: don't use the default overlay implementation (flannel). We will instead install our CNI plugin of choice, which will set up the overlay and networking in the cluster.
Now, we need to mount the relevant volumes for our "host". k3d runs in docker, so we can exec onto the node directly. In the real world, we could just ssh into the node.
# To get the name of your node, do `kubectl get nodes` for your cluster, OR `docker ps` (and you should see the name of the docker container).
#
# In my case, we spin-up a 1-node cluster, this would have to be repeated for every node in the cluster.
# Mount ebpf volume, then make it shared
$ docker exec -it k3d-cilium-repro-server-0 mount bpffs /sys/fs/bpf -t bpf
$ docker exec -it k3d-cilium-repro-server-0 mount --make-shared /sys/fs/bpf
# Finally, we have to share a second volume (otherwise the init container will fail)
docker exec -it k3d-cilium-repro-server-0 mount --make-shared /run/cilium/cgroupv2/
I'm not sure why this is a prerequisite. From what I've seen online, people say Cilium should do this on its own, but its init containers use bash
, and since k3s/k3d use busybox, they only have an sh interpreter. See this issue if the steps above do not work.
We can install Cilium by creating/applying this manifest (taken from their quickstart):
$ kubectl create -f https://raw.githubusercontent.com/cilium/cilium/v1.9/install/kubernetes/quick-install.yaml
We will spin-up a DaemonSet resource that will create a pod on every host that we have (in my case, just 1). We will use this to get a live fs watch.
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: linkerd-cni-dir
labels:
k8s-app: linkerd-cni-dir
spec:
selector:
matchLabels:
k8s-app: linkerd-cni-dir
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
template:
metadata:
labels:
k8s-app: linkerd-cni-dir
spec:
nodeSelector:
kubernetes.io/os: linux
containers:
- name: debian
image: debian:bullseye-slim
command: ["/bin/sh"]
args: ["-c", "sleep infinity"]
volumeMounts:
- mountPath: host/etc/cni/net.d
name: cni-net-dir
volumes:
- name: cni-net-dir
hostPath:
path: "/etc/cni/net.d"
# I have the above file on my local machine as `cni-ds.yaml`
$ kubectl apply -f cni-ds.yaml
daemonset.apps/linkerd-cni-dir created
# default doesn't need to be specified
$ kubectl get pods [-n default]
$ kubectl exec linkerd-cni-dir-hdw62 -it -- bash
root@linkerd-cni-dir-hdw62:/# cd /host/etc/cni/net.d/
root@linkerd-cni-dir-hdw62:/host/etc/cni/net.d# ls
05-cilium.conf
In a separate window, we'll install the cni-plugin. Assuming you have linkerd install
it's as easy as linkerd install-cni | kubectl apply -f -
$ root@linkerd-cni-dir-hdw62:/host/etc/cni/net.d# ls
05-cilium.conf 05-cilium.conflist ZZZ-linkerd-cni-kubeconfig
Two files are there, not good.
Problem seems to be with our config installation. As a quick fix, I'll keep track of the old filepath and remove it at the end of the function.
e.g
$ git diff
diff --git a/cni-plugin/deployment/scripts/install-cni.sh b/cni-plugin/deployment/scripts/install-cni.sh
index c36ac902c..1ed2e0040 100755
--- a/cni-plugin/deployment/scripts/install-cni.sh
+++ b/cni-plugin/deployment/scripts/install-cni.sh
@@ -234,6 +234,7 @@ install_cni_conf() {
# If the old config filename ends with .conf, rename it to .conflist, because it has changed to be a list
filename=${cni_conf_path##*/}
extension=${filename##*.}
+ old_file_path=${cni_conf_path}
if [ "${filename}" != '01-linkerd-cni.conf' ] && [ "${extension}" = 'conf' ]; then
echo "Renaming ${cni_conf_path} extension to .conflist"
cni_conf_path="${cni_conf_path}list"
@@ -246,6 +247,8 @@ install_cni_conf() {
# Move the temporary CNI config into place.
mv "${TMP_CONF}" "${cni_conf_path}" || exit_with_error 'Failed to mv files.'
+ echo "Old filepath: ${old_file_path}"
+ rm -f ${old_file_path}
echo "Created CNI config ${cni_conf_path}"
}
All that's left is to:
- Build
- Deploy
- Check
In the Linkerd repo:
# We will need to re-build (or run through go-run, but building doesn't take long) the CLI.
# This will make all rendered manfiests use an image version that depends on the local git revision.
# e.g, if you commit your changes, the tag will look like: 'git-123551-aui2',
# if you don't, then the tag will be dev-<sha-of-head-rev>-<hostname>.
#
$ bin/build-cli-bin
# Build the cni plugin
#
$ bin/docker-build-cni-plugin
# Import image in your k3d cluster
$ bin/image-load --k3d --cluster <name> cni-plugin
# Our image-load scripts wraps 'k3d image import', you can use that directly if you'd like:
# $ k3d image import -c cilium-repro cr.l5d.io/linkerd/cni-plugin:dev-0810fde8-matei
#
# NOTE: both 'image import' and our 'image-load' script expect the cluster name to be without the k3d prefix.
#
# Delete old cni plugin, restart Cilium to have it re-create its conf file
$ kubectl delete ns linkerd-cni
$ kubectl rollout restart ds -n kube-system
$ bin/linkerd install-cni | k apply -f -
In file watcher, we should now only see a *.conflist
file
$ root@linkerd-cni-dir-hdw62:/host/etc/cni/net.d# ls
05-cilium.conflist ZZZ-linkerd-cni-kubeconfig