I have a cluster of 3 Minisforum MS-01 Mini PCs running Talos Linux for my k8s cluster. Previously, I had set these machines up using Debian & K3s for the cluster and configured Thunderbolt Networking by follow this video by Jim's Garage Which is baed on this gist series from scyto
This is how I have mine connected, this means that each one of the my 3 nodes has a cable going to the other two nodes
-
A Bare-metal cluster of 3 machines
-
Each of your machines needs 2x Thunderbolt ports
-
Visit https://factory.talos.dev/ and start the process of setting up an image. Choose the following options:
- Hardware Type: Bare-metal Machine
- Talos Linux Version: 1.7.5 (at the time of writing)
- Machine Architecture: amd64
- Secure Boot: Off (unless you hate life)
- System Extensions:
- siderolabs/i915-ucode
- siderolabs/intel-ucode
- siderolabs/thunderbolt
- siderolabs/util-linux-tools
-
The following extra Args added to your global patches
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
- Note
pcie_ports=native
, might be needed for you. It caused my nodes to crash but was required for Buroa's to work on his Mac Minis. I would exclude this for now and only add it if you run into issues - Now you can prepare to install
- You should see options like this:
Note down the initial install and upgrade links
- Build a bootable USB Stick using a tool like Rufus using the "First Boot" ISO shown above
- Plug the USB into each machine and one at a time boot them.
- Once booted you can remove the USB stick and do the next machine.
- Ensure you have TalosCTL install on your local machine/dev environment etc
- Follow the incredible Cluster Template from Onedr0p
- NOTE: Use the upgrade schematic ID in your config.yaml when you set it up.
- After you have your cluster all up and running you can check that your extraArgs are in place using
talosctl -n <node name> edit mc
You should see a section that looks like this:
install:
diskSelector:
serial: S73VNU0X303413H
extraKernelArgs:
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
How the heck do you identify your PCI devices on Talos? Turns out the key is kubectl-node-shell combined with TalosCTL
- In your home-ops repo you should have a .taskfiles directory, if not create one
- Create a folder called Kubernetes and a subdirectory called resources
- Under the Kubernetes folder, create a file called
Taskfile.yaml
- In the
resources
subdirectory create a file calledprivileged-pod.tmpl.yaml
- You should have a structure like this
- Add the following to the
privileged-pod.tmpl.yaml
file and save it
---
apiVersion: v1
spec:
containers:
- name: debug
image: docker.io/library/alpine:latest
# image: docker.io/library/ubuntu:latest
command: ["/bin/sh"]
stdin: true
stdinOnce: true
tty: true
securityContext:
allowPrivilegeEscalation: true
privileged: true
volumeMounts:
- mountPath: /rootfs
name: rootfs
- mountPath: /sys/firmware/efi/efivars
name: efivars
- mountPath: /run/containerd
name: containerd
dnsPolicy: ClusterFirstWithHostNet
hostIPC: true
hostNetwork: true
hostPID: true
nodeName: ${node}
restartPolicy: Never
volumes:
- name: rootfs
hostPath:
path: /
- name: efivars
hostPath:
path: /sys/firmware/efi/efivars
- name: containerd
hostPath:
path: /run/containerd
- In your
Taskfile.yaml
add the following
---
# yaml-language-server: $schema=https://taskfile.dev/schema.json
version: "1"
vars:
KUBERNETES_RESOURCES_DIR: "{{.ROOT_DIR}}/.taskfiles/Kubernetes/resources"
tasks:
privileged:
desc: Run a privileged pod
cmd: |
kubectl run privileged-{{.node}} -i --rm --image=null \
--overrides="$(yq {{.KUBERNETES_RESOURCES_DIR}}/privileged-pod.tmpl.yaml -o=json | envsubst)"
env:
node: "{{.node}}"
preconditions:
- test -f {{.KUBERNETES_RESOURCES_DIR}}/privileged-pod.tmpl.yaml
- Start up the container using the following command
task kubernetes:privileged node={your nodes names}
- Since The task file created a container using Alpine Linux we will need to use apk to install a couple of utilties
apk update && apk add pciutils
- These will allow us to get the information we need.
- Next run:
lspci | grep -i thunderbolt
- You should get a response that looks something like this:
00:07.0 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 (rev 02)
00:07.2 PCI bridge: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #2 (rev 02)
00:0d.0 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller (rev 02)
00:0d.2 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 (rev 02)
00:0d.3 USB controller: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #1 (rev 02)
- If you do, proceed.
- First lets take a look at our links
talosctl -n stanton-01 get links
- You should be able to see something like this:
NODE NAMESPACE TYPE ID VERSION TYPE KIND HW ADDR OPER STATE LINK STATE
stanton-01 network LinkStatus thunderbolt0 3 ether 02:9f:19:a1:73:94 up false
stanton-01 network LinkStatus thunderbolt1 3 ether 02:35:20:8c:5f:42 up false
- Next lets get a more specific look at the Thudnerbolt links
talosctl get links -oyaml | more
- Once this shows up, press
/
and then typeThunderbolt
and enter - You should see something like this:
---
node: 10.90.3.101
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: cilium_host
/thunderbolt
...skipping
id: thunderbolt0
version: 3
owner: network.LinkStatusController
phase: running
created: 2024-07-17T02:11:30Z
updated: 2024-07-17T02:11:35Z
spec:
index: 44
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 02:9f:19:a1:73:94
permanentAddr: 02:9f:19:a1:73:94
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 65520
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
busPath: 1-1.0
driver: thunderbolt-net
driverVersion: 6.6.33-talos
linkState: false
port: Other
duplex: Unknown
---
node: 10.90.3.101
metadata:
namespace: network
type: LinkStatuses.net.talos.dev
id: thunderbolt1
version: 3
owner: network.LinkStatusController
phase: running
created: 2024-07-17T02:01:13Z
updated: 2024-07-17T02:01:14Z
spec:
index: 41
type: ether
linkIndex: 0
flags: UP,BROADCAST,RUNNING,MULTICAST,LOWER_UP
hardwareAddr: 02:35:20:8c:5f:42
permanentAddr: 02:35:20:8c:5f:42
broadcastAddr: ff:ff:ff:ff:ff:ff
mtu: 65520
queueDisc: pfifo_fast
operationalState: up
kind: ""
slaveKind: ""
busPath: 0-1.0
driver: thunderbolt-net
driverVersion: 6.6.33-talos
linkState: false
port: Other
duplex: Unknown
---
- Node down the two bus paths:
id: thunderbolt0
busPath: 1-1.0
id: thunderbolt1
busPath: 0-1.0
- Next we need to know HOW things are connected.
- Go to your nodes and unplug, and replug each cable one a time
Node01 - Cable One
Node01 - Cable Two
Node02 - Cable One
Node02 - Cable Two
Node03 - Cable Three
Node04 - Cable Four
- Now lets see how the machines are connected. Run the following command:
talosctl -n stanton-01 dmesg | grep thunderbolt
- You are looking for lines like this:
stanton-01: kern: info: [2024-07-17T00:24:12.071257675Z]: thunderbolt 0-1: Intel Corp. stanton-02
stanton-01: kern: info: [2024-07-17T00:25:41.370465675Z]: thunderbolt 1-1: Intel Corp. stanton-03
- This tells you which bus-path (and thus which thunderbolt network interface) is connected to which machine
- Repeat this process for each node
stanton-02: kern: info: [2024-07-17T00:25:44.855454779Z]: thunderbolt 0-1: Intel Corp. stanton-01
stanton-02: kern: info: [2024-07-17T00:26:11.798178779Z]: thunderbolt 1-1: Intel Corp. stanton-03
stanton-03: kern: info: [2024-07-17T00:25:41.495885192Z]: thunderbolt 0-1: Intel Corp. stanton-01
stanton-03: kern: info: [2024-07-17T00:25:57.912867192Z]: thunderbolt 1-1: Intel Corp. stanton-02
- Now that we know that, we are ready to configure Talos
- Open up your
talconfig.yaml
in your editor of choice - Look for the
nodes:
you should have section under here for each of your nodes - Look for a section under each node called
networkInterfaces:
- Add the following
deviceSelector
under the others that may be there
- deviceSelector:
busPath: 0-1.0 # stanton-02
dhcp: false
mtu: 65520
addresses:
- 169.254.255.101/32
routes:
- network: 169.254.255.102/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-03
dhcp: false
mtu: 65520
addresses:
- 169.254.255.101/32
routes:
- network: 169.254.255.103/32
metric: 2048
- Scroll down and add the following to the second node
- deviceSelector:
busPath: 0-1.0 # stanton-01
dhcp: false
mtu: 65520
addresses:
- 169.254.255.102/32
routes:
- network: 169.254.255.101/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-03
dhcp: false
mtu: 65520
addresses:
- 169.254.255.102/32
routes:
- network: 169.254.255.103/32
metric: 2048
- Then the third node:
- deviceSelector:
busPath: 0-1.0 # stanton-01
dhcp: false
mtu: 65520
addresses:
- 169.254.255.103/32
routes:
- network: 169.254.255.101/32
metric: 2048
- deviceSelector:
busPath: 1-1.0 # stanton-02
dhcp: false
mtu: 65520
addresses:
- 169.254.255.103/32
routes:
- network: 169.254.255.102/32
metric: 2048
-
Note: Ensure you are adjusting the following:
- busPath
- Comments here to match your node names based on the queries from earlier
- Ip Addresses used. You can use what ever you like here. I set the last octet to match that of the nodes primary netowrk Ip for easier identification.
-
Once that is done. Lets setup a global patch for your extraArgs
-
Navigate to your
talos/patches/global
folder and create a new file calledkernel.yaml
-
Add the following and save:
machine:
install:
extraKernelArgs:
- intel_iommu=on
- iommu=pt
- mitigations=off
- net.ifnames=0
# - pcie_ports=native
- Note that
pcie_ports=native
is commented out here, your nodes may need it. - Apply your config and get your nodes rebooted
- If you havent already. Close down your privilaged container and swap the image used in your
privileged-pod.tmpl.yaml
to the ubuntu one
- Run the task file again to spin up the ubuntu container
task kubernetes:privileged node={your nodes names}
- Check which of your cores are Performance and which are Efficiency using:
lscpu --all --extended
- You should see an output like this.
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 4900.0000 400.0000 737.4020
1 0 0 0 0:0:0:0 yes 4900.0000 400.0000 678.6480
2 0 0 1 4:4:1:0 yes 4900.0000 400.0000 645.2680
3 0 0 1 4:4:1:0 yes 4900.0000 400.0000 754.7570
4 0 0 2 8:8:2:0 yes 5000.0000 400.0000 1336.6210
5 0 0 2 8:8:2:0 yes 5000.0000 400.0000 923.7330
6 0 0 3 12:12:3:0 yes 5000.0000 400.0000 407.3450
7 0 0 3 12:12:3:0 yes 5000.0000 400.0000 413.3750
8 0 0 4 16:16:4:0 yes 4900.0000 400.0000 426.1340
9 0 0 4 16:16:4:0 yes 4900.0000 400.0000 676.8810
10 0 0 5 20:20:5:0 yes 4900.0000 400.0000 615.2590
11 0 0 5 20:20:5:0 yes 4900.0000 400.0000 400.0000
12 0 0 6 24:24:6:0 yes 3800.0000 400.0000 883.2140
13 0 0 7 25:25:6:0 yes 3800.0000 400.0000 671.7470
14 0 0 8 26:26:6:0 yes 3800.0000 400.0000 1210.9910
15 0 0 9 27:27:6:0 yes 3800.0000 400.0000 1240.5000
16 0 0 10 28:28:7:0 yes 3800.0000 400.0000 995.9070
17 0 0 11 29:29:7:0 yes 3800.0000 400.0000 474.6940
18 0 0 12 30:30:7:0 yes 3800.0000 400.0000 400.1950
19 0 0 13 31:31:7:0 yes 3800.0000 400.0000 1114.0560
- Take note of which CPU id's are performance cores (Your efficiency cores are the ones with the lower MAXMHZ).
- In this example CPU 0-11 are Performance cores, whilst 12-19 are Efficiency Cores
- Run the following command to force the node to use performance cores (update the echo value to match your CPU ids). Some people report one or the other working, see which one works for you
grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c 'echo 0-11 | tee "/proc/irq/{}/smp_affinity_list"'
OR
grep thunderbolt /proc/interrupts | cut -d ":" -f1 | xargs -I {} sh -c "echo 0-11 > /proc/irq/{}/smp_affinity_list"
- You should get a response like:
0-11
0-11
0-11
0-11
0-11
0-11
0-11
0-11
- Exit out of your shell container
- Open up 3x Terminal windows or tabs
- In each, stand up an ubuntu privilaged container using the same command as before
task kubernetes:privileged node={your nodes names}
- Once in the shell, running the following command on each
apt update
apt install iperf3 pciutils
- Once done, run the following command on your first node (this will start the iperf server)
iperf3 -s -B 169.254.255.101
- This binds the server to the IP Address you set for thunder bolt
- Now go over to the shell for your second node and run:
iperf3 -c 169.254.255.101 -B 169.254.255.102 -R
- Once complete run it again on the other server
iperf3 -c 169.254.255.101 -B 169.254.255.103 -R
- Pop back to the first nodes shell and see the combined tests. You should see something like this:
-----------------------------------------------------------
Server listening on 5201 (test #1)
-----------------------------------------------------------
Accepted connection from 169.254.255.102, port 58349
[ 5] local 169.254.255.101 port 5201 connected to 169.254.255.102 port 39333
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 3.03 GBytes 26.0 Gbits/sec 29 3.06 MBytes
[ 5] 1.00-2.00 sec 3.05 GBytes 26.2 Gbits/sec 1 3.06 MBytes
[ 5] 2.00-3.00 sec 3.07 GBytes 26.4 Gbits/sec 33 2.68 MBytes
[ 5] 3.00-4.00 sec 2.89 GBytes 24.8 Gbits/sec 112 3.31 MBytes
[ 5] 4.00-5.00 sec 3.08 GBytes 26.4 Gbits/sec 29 2.68 MBytes
[ 5] 5.00-6.00 sec 3.08 GBytes 26.4 Gbits/sec 0 3.00 MBytes
[ 5] 6.00-7.00 sec 3.09 GBytes 26.5 Gbits/sec 0 3.00 MBytes
[ 5] 7.00-8.00 sec 3.10 GBytes 26.6 Gbits/sec 0 3.00 MBytes
[ 5] 8.00-9.00 sec 3.05 GBytes 26.2 Gbits/sec 30 2.75 MBytes
[ 5] 9.00-10.00 sec 3.07 GBytes 26.4 Gbits/sec 2 2.75 MBytes
[ 5] 10.00-10.00 sec 128 KBytes 1.61 Gbits/sec 0 2.75 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 30.5 GBytes 26.2 Gbits/sec 236 sender
-----------------------------------------------------------
Server listening on 5201 (test #2)
-----------------------------------------------------------
Accepted connection from 169.254.255.103, port 49163
[ 5] local 169.254.255.101 port 5201 connected to 169.254.255.103 port 44701
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 3.01 GBytes 25.8 Gbits/sec
[ 5] 1.00-2.00 sec 3.01 GBytes 25.9 Gbits/sec
[ 5] 2.00-3.00 sec 3.03 GBytes 26.0 Gbits/sec
[ 5] 3.00-4.00 sec 3.05 GBytes 26.2 Gbits/sec
[ 5] 4.00-5.00 sec 3.07 GBytes 26.4 Gbits/sec
[ 5] 5.00-6.00 sec 3.05 GBytes 26.1 Gbits/sec
[ 5] 6.00-7.00 sec 2.40 GBytes 20.6 Gbits/sec
[ 5] 7.00-8.00 sec 3.03 GBytes 26.0 Gbits/sec
[ 5] 8.00-9.00 sec 2.93 GBytes 25.2 Gbits/sec
[ 5] 9.00-10.00 sec 3.02 GBytes 26.0 Gbits/sec
[ 5] 10.00-10.00 sec 1.88 MBytes 18.2 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 29.6 GBytes 25.4 Gbits/sec receiver
- You can ignore the last line of the test as thats just the reminants of the test file. You now have nice, fast TB Networking running on your nodes!
- In order to persist your changes for Performance Cores you will need something to make these changes every time your nodes boot/reboot.
- This can be achieved using irqbalance.
- Here is an example container that does just that. NOTE: You will need to update the BAN list to match your CPUs E-Core
- Special Thanks to buroa for all his help getting this setup (and the aforementioned container)
I have no idea (running it in alpine, but does not explain things for me).