I have a cluster of 3 Minisforum MS-01 Mini PCs running Talos Linux for my k8s cluster. Previously, I had set these machines up using Debian & K3s for the cluster and configured Thunderbolt Networking by follow this video by Jim's Garage Which is baed on this gist series from scyto
This is how I have mine connected, this means that each one of the my 3 nodes has a cable going to the other two nodes
-
A Bare-metal cluster of 3 machines
-
Each of your machines needs 2x Thunderbolt ports
-
Visit https://factory.talos.dev/ and start the process of setting up an image. Choose the following options:
- Hardware Type: Bare-metal Machine
- Talos Linux Version: 1.7.5 (at the time of writing)
- Machine Architecture: amd64
- Secure Boot: Off (unless you hate life)
- System Extensions:
- siderolabs/i915-ucode
- siderolabs/intel-ice-firmware
- siderolabs/intel-ucode
- siderolabs/thunderbolt
- siderolabs/util-linux-tools
-
The following extra Args added to your global patches
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
- Note
pcie_ports=native
, might be needed for you. It caused my nodes to crash but was required for Buroa's to work on his Mac Minis. I would exclude this for now and only add it if you run into issues - Now you can prepare to install
- You should see options like this:
Note down the initial install and upgrade links
- Build a bootable USB Stick using a tool like Rufus using the "First Boot" ISO shown above
- Plug the USB into each machine and one at a time boot them.
- Once booted you can remove the USB stick and do the next machine.
- Ensure you have TalosCTL install on your local machine/dev environment etc
- Follow the incredible Cluster Template from Onedr0p
- NOTE: Use the upgrade schematic ID in your config.yaml when you set it up.
- After you have your cluster all up and running you can check that your extraArgs are in place using
talosctl -n <node name> edit mc
You should see a section that looks like this:
install:
diskSelector:
serial: S73VNU0X303413H
extraKernelArgs:
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
How the heck do you identify your PCI devices on Talos? Turns out the key is kubectl-node-shell combined with TalosCTL
- In your home-ops repo you should have a .taskfiles directory, if not create one
- Create a folder called Kubernetes and a subdirectory called resources
- Under the Kubernetes folder, create a file called
Taskfile.yaml
- In the
resources
subdirectory create a file calledprivileged-pod.tmpl.yaml
- You should have a structure like this
- Add the following to the
privileged-pod.tmpl.yaml
file and save it
---
apiVersion: v1
spec:
containers:
- name: debug
image: docker.io/library/alpine:latest
# image: docker.io/library/ubuntu:latest
command: ["/bin/sh"]
stdin: true
stdinOnce: true
tty: true
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /rootfs
name: rootfs
- mountPath: /sys/firmware/efi/efivars
name: efivars
- mountPath: /run/containerd
name: containerd
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostNetwork: true
hostPID: true
nodeName: ${node}
restartPolicy: Never
volumes:
- name: rootfs
hostPath:
path: /
- name: efivars
hostPath:
path: /sys/firmware/efi/efivars
- name: containerd
hostPath:
path: /run/containerd
- In your
Taskfile.yaml
add the following
---
# yaml-language-server: $schema=https://taskfile.dev/schema.json
version: "1"
vars:
KUBERNETES_RESOURCES_DIR: "{{.ROOT_DIR}}/.taskfiles/Kubernetes/resources"
tasks:
privileged:
desc: Run a privileged pod
cmd: |
kubectl run privileged-{{.node}} -i --rm --image=null \
--overrides="$(yq {{.KUBERNETES_RESOURCES_DIR}}/privileged-pod.tmpl.yaml -o=json | envsubst)"
env:
node: "{{.node}}"
preconditions:
- test -f {{.KUBERNETES_RESOURCES_DIR}}/privileged-pod.tmpl.yaml
- Start up the container using the following command
task kubernetes:privileged node={your nodes names}
- Since The task file created a container using Alpine Linux we will need to use apk to install a couple of utilties
apk update && apk add pciutils
- These will allow us to get the information we need.
- Next run:
lspci | grep -i thunderbolt
- You should get a response that looks something like this:
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:07.2 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:0d.3 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #1 (rev 02)
- If you do, proceed.
- First lets take a look at our links
talosctl -n stanton-01 get links
- You should be able to see something like this:
NODE NAMESPACE TYPE ID VERSION TYPE KIND HW ADDR OPER STATE LINK STATE
stanton-01 network LinkStatus thunderbolt0 3 ether 02:9f:19:a1:73:94 up false
stanton-01 network LinkStatus thunderbolt1 3 ether 02:35:20:8c:5f:42 up false
- Next lets get a more specific look at the Thudnerbolt links
talosctl get links -oyaml | more
- Once this shows up, press
/
and then typeThunderbolt
and enter - You should see something like this:
---
node: 10.90.3.101
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: cilium_host
/thunderbolt
...skipping
id: thunderbolt0
version: 3
owner: network.LinkStatusController
phase: running
created: 2024-07-17T02:11:30Z
updated: 2024-07-17T02:11:35Z
spec:
index: 44
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 02:9f:19:a1:73:94
permanentAddr: 02:9f:19:a1:73:94
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 65520
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
busPath: 1-1.0
driver: thunderbolt-net
driverVersion: 6.6.33-talos
linkState: false
port: Other
duplex: Unknown
---
node: 10.90.3.101
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: thunderbolt1
version: 3
owner: network.LinkStatusController
phase: running
created: 2024-07-17T02:01:13Z
updated: 2024-07-17T02:01:14Z
spec:
index: 41
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 02:35:20:8c:5f:42
permanentAddr: 02:35:20:8c:5f:42
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 65520
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
busPath: 0-1.0
driver: thunderbolt-net
driverVersion: 6.6.33-talos
linkState: false
port: Other
duplex: Unknown
---
- Node down the two bus paths:
id: thunderbolt0
busPath: 1-1.0
id: thunderbolt1
busPath: 0-1.0
- Next we need to know HOW things are connected.
- Go to your nodes and unplug, and replug each cable one a time
Node01 - Cable One
Node01 - Cable Two
Node02 - Cable One
Node02 - Cable Two
Node03 - Cable Three
Node04 - Cable Four
- Now lets see how the machines are connected. Run the following command:
talosctl -n stanton-01 dmesg | grep thunderbolt
- You are looking for lines like this:
stanton-01: kern: info: [2024-07-17T00:24:12.071257675Z]: thunderbolt 0-1: Intel Corp. stanton-02
stanton-01: kern: info: [2024-07-17T00:25:41.370465675Z]: thunderbolt 1-1: Intel Corp. stanton-03
- This tells you which bus-path (and thus which thunderbolt network interface) is connected to which machine
- Repeat this process for each node
stanton-02: kern: info: [2024-07-17T00:25:44.855454779Z]: thunderbolt 0-1: Intel Corp. stanton-01
stanton-02: kern: info: [2024-07-17T00:26:11.798178779Z]: thunderbolt 1-1: Intel Corp. stanton-03
stanton-03: kern: info: [2024-07-17T00:25:41.495885192Z]: thunderbolt 0-1: Intel Corp. stanton-01
stanton-03: kern: info: [2024-07-17T00:25:57.912867192Z]: thunderbolt 1-1: Intel Corp. stanton-02
- Now that we know that, we are ready to configure Talos
- Open up your
talconfig.yaml
in your editor of choice - Look for the
nodes:
you should have section under here for each of your nodes - Look for a section under each node called
networkInterfaces:
- Add the following
deviceSelector
under the others that may be there
- deviceSelector:
busPath: 0-1.0 # stanton-02
dhcp: false
mtu: 65520
addresses:
- 169.254.255.101/32
routes:
- network: 169.254.255.102/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-03
dhcp: false
mtu: 65520
addresses:
- 169.254.255.101/32
routes:
- network: 169.254.255.103/32
metric: 2048
- Scroll down and add the following to the second node
- deviceSelector:
busPath: 0-1.0 # stanton-01
dhcp: false
mtu: 65520
addresses:
- 169.254.255.102/32
routes:
- network: 169.254.255.101/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-03
dhcp: false
mtu: 65520
addresses:
- 169.254.255.102/32
routes:
- network: 169.254.255.103/32
metric: 2048
- Then the third node:
- deviceSelector:
busPath: 0-1.0 # stanton-01
dhcp: false
mtu: 65520
addresses:
- 169.254.255.103/32
routes:
- network: 169.254.255.101/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-02
dhcp: false
mtu: 65520
addresses:
- 169.254.255.103/32
routes:
- network: 169.254.255.102/32
metric: 2048
-
Note: Ensure you are adjusting the following:
- busPath
- Comments here to match your node names based on the queries from earlier
- Ip Addresses used. You can use what ever you like here. I set the last octet to match that of the nodes primary netowrk Ip for easier identification.
-
Once that is done. Lets setup a global patch for your extraArgs
-
Navigate to your
talos/patches/global
folder and create a new file calledkernel.yaml
-
Add the following and save:
machine:
install:
extraKernelArgs:
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
# - pcie_ports=native
- Note that
pcie_ports=native
is commented out here, your nodes may need it. - Apply your config and get your nodes rebooted
- If you havent already. Close down your privilaged container and swap the image used in your
privileged-pod.tmpl.yaml
to the ubuntu one
- Run the task file again to spin up the ubuntu container
task kubernetes:privileged node={your nodes names}
- Check which of your cores are Performance and which are Efficiency using:
lscpu --all --extended
- You should see an output like this.
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 4900.0000 400.0000 737.4020
1 0 0 0 0:0:0:0 yes 4900.0000 400.0000 678.6480
2 0 0 1 4:4:1:0 yes 4900.0000 400.0000 645.2680
3 0 0 1 4:4:1:0 yes 4900.0000 400.0000 754.7570
4 0 0 2 8:8:2:0 yes 5000.0000 400.0000 1336.6210
5 0 0 2 8:8:2:0 yes 5000.0000 400.0000 923.7330
6 0 0 3 12:12:3:0 yes 5000.0000 400.0000 407.3450
7 0 0 3 12:12:3:0 yes 5000.0000 400.0000 413.3750
8 0 0 4 16:16:4:0 yes 4900.0000 400.0000 426.1340
9 0 0 4 16:16:4:0 yes 4900.0000 400.0000 676.8810
10 0 0 5 20:20:5:0 yes 4900.0000 400.0000 615.2590
11 0 0 5 20:20:5:0 yes 4900.0000 400.0000 400.0000
12 0 0 6 24:24:6:0 yes 3800.0000 400.0000 883.2140
13 0 0 7 25:25:6:0 yes 3800.0000 400.0000 671.7470
14 0 0 8 26:26:6:0 yes 3800.0000 400.0000 1210.9910
15 0 0 9 27:27:6:0 yes 3800.0000 400.0000 1240.5000
16 0 0 10 28:28:7:0 yes 3800.0000 400.0000 995.9070
17 0 0 11 29:29:7:0 yes 3800.0000 400.0000 474.6940
18 0 0 12 30:30:7:0 yes 3800.0000 400.0000 400.1950
19 0 0 13 31:31:7:0 yes 3800.0000 400.0000 1114.0560
- Take note of which CPU id's are performance cores (Your efficiency cores are the ones with the lower MAXMHZ).
- In this example CPU 0-11 are Performance cores, whilst 12-19 are Efficiency Cores
- Run the following command to force the node to use performance cores (update the echo value to match your CPU ids)
grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-11 | tee "/proc/irq/{}/smp_affinity_list"'
- You should get a response like:
0-11
0-11
0-11
0-11
0-11
0-11
0-11
0-11
- Exit out of your shell container
- Open up 3x Terminal windows or tabs
- In each, stand up an ubuntu privilaged container using the same command as before
task kubernetes:privileged node={your nodes names}
- Once in the shell, running the following command on each
apt update
apt install iperf3 pciutils
- Once done, run the following command on your first node (this will start the iperf server)
iperf3 -s -B 169.254.255.101
- This binds the server to the IP Address you set for thunder bolt
- Now go over to the shell for your second node and run:
iperf3 -c 169.254.255.101 -B 169.254.255.102 -R
- Once complete run it again on the other server
iperf3 -c 169.254.255.101 -B 169.254.255.103 -R
- Pop back to the first nodes shell and see the combined tests. You should see something like this:
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 169.254.255.102, port 58349
[ 5] local 169.254.255.101 port 5201 connected to 169.254.255.102 port 39333
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 3.03 GBytes 26.0 Gbits/sec 29 3.06 MBytes
[ 5] 1.00-2.00 sec 3.05 GBytes 26.2 Gbits/sec 1 3.06 MBytes
[ 5] 2.00-3.00 sec 3.07 GBytes 26.4 Gbits/sec 33 2.68 MBytes
[ 5] 3.00-4.00 sec 2.89 GBytes 24.8 Gbits/sec 112 3.31 MBytes
[ 5] 4.00-5.00 sec 3.08 GBytes 26.4 Gbits/sec 29 2.68 MBytes
[ 5] 5.00-6.00 sec 3.08 GBytes 26.4 Gbits/sec 0 3.00 MBytes
[ 5] 6.00-7.00 sec 3.09 GBytes 26.5 Gbits/sec 0 3.00 MBytes
[ 5] 7.00-8.00 sec 3.10 GBytes 26.6 Gbits/sec 0 3.00 MBytes
[ 5] 8.00-9.00 sec 3.05 GBytes 26.2 Gbits/sec 30 2.75 MBytes
[ 5] 9.00-10.00 sec 3.07 GBytes 26.4 Gbits/sec 2 2.75 MBytes
[ 5] 10.00-10.00 sec 128 KBytes 1.61 Gbits/sec 0 2.75 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 30.5 GBytes 26.2 Gbits/sec 236 sender
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
Accepted connection from 169.254.255.103, port 49163
[ 5] local 169.254.255.101 port 5201 connected to 169.254.255.103 port 44701
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 3.01 GBytes 25.8 Gbits/sec
[ 5] 1.00-2.00 sec 3.01 GBytes 25.9 Gbits/sec
[ 5] 2.00-3.00 sec 3.03 GBytes 26.0 Gbits/sec
[ 5] 3.00-4.00 sec 3.05 GBytes 26.2 Gbits/sec
[ 5] 4.00-5.00 sec 3.07 GBytes 26.4 Gbits/sec
[ 5] 5.00-6.00 sec 3.05 GBytes 26.1 Gbits/sec
[ 5] 6.00-7.00 sec 2.40 GBytes 20.6 Gbits/sec
[ 5] 7.00-8.00 sec 3.03 GBytes 26.0 Gbits/sec
[ 5] 8.00-9.00 sec 2.93 GBytes 25.2 Gbits/sec
[ 5] 9.00-10.00 sec 3.02 GBytes 26.0 Gbits/sec
[ 5] 10.00-10.00 sec 1.88 MBytes 18.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 29.6 GBytes 25.4 Gbits/sec receiver
- You can ignore the last line of the test as thats just the reminants of the test file. You now have nice, fast TB Networking running on your nodes!
- In order to persist your changes for Performance Cores you will need something to make these changes every time your nodes boot/reboot.
- This can be achieved using irqbalance.
- Here is an example container that does just that. NOTE: You will need to update the BAN list to match your CPUs E-Core
- Special Thanks to buroa for all his help getting this setup (and the aforementioned container)