superseb/cluster.yml

patan32 · 2024-01-10T22:18:46Z

Here is a Rancher RKE2 example

spec:
  rkeConfig:
    machineGlobalConfig:
      kube-apiserver-arg:
        - '--default-not-ready-toleration-seconds=30'
        - '--default-unreachable-toleration-seconds=30'
      kube-controller-manager-arg:
        - '--node-monitor-period=2s'
        - '--node-monitor-grace-period=16s'
        - '--pod-eviction-timeout=30s'
    machineSelectorConfig:
      - config:
          kubelet-arg:
            - '--node-status-update-frequency=4s'
            - '--max-pods=200'

Hello,

I am wondering how i can apply this to my RKE2 Cluster? When i go to the cluster in rancher i can't see edit yaml button. Any help is appreciated.

superseb · 2024-01-22T10:07:44Z

@patan32 Probably want to check rancher/rancher#43918, depending on what versions you are using it could be old/new chosen behavior or a new bug.

Zappelphilipp · 2024-02-06T15:50:15Z

Here is a Rancher RKE2 example

spec:
  rkeConfig:
    machineGlobalConfig:
      kube-apiserver-arg:
        - '--default-not-ready-toleration-seconds=30'
        - '--default-unreachable-toleration-seconds=30'
      kube-controller-manager-arg:
        - '--node-monitor-period=2s'
        - '--node-monitor-grace-period=16s'
        - '--pod-eviction-timeout=30s'
    machineSelectorConfig:
      - config:
          kubelet-arg:
            - '--node-status-update-frequency=4s'
            - '--max-pods=200'

Tried on my Rancher RKE2 based cluster - can not recommend - did crash my master nodes or at least did not want to apply the settings. master nodes stuck on "waiting for kube-controller". the failed nodes told me:

journalctl -xeu rke2-server.service
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Reconciling ETCDSnapshotFile resources"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Tunnel server egress proxy mode: agent"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Starting managed etcd node metadata controller"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Reconciliation of ETCDSnapshotFile resources complete"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Starting k3s.cattle.io/v1, Kind=Addon controller"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Creating deploy event broadcaster"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Starting /v1, Kind=Node controller"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Cluster dns configmap already exists"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Labels and annotations have been set successfully on node: rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Starting /v1, Kind=Secret controller"
Feb 06 16:22:33 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:33+01:00" level=info msg="Updating TLS secret for kube-system/rke2-serving (count: 16): map[listener.cattle.io/cn-10.11.55.170:10.11.55.170 listener.cattle.io/cn->
Feb 06 16:22:36 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: time="2024-02-06T16:22:36+01:00" level=info msg="Running kube-proxy --cluster-cidr=10.42.0.0/16 --conntrack-max-per-core=0 --conntrack-tcp-timeout-close-wait=0s --conntrack-tcp-timeout->
Feb 06 16:25:52 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: 2024/02/06 16:25:52 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".
Feb 06 16:28:52 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: 2024/02/06 16:28:52 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".
Feb 06 16:32:12 rocky-v10-pool3-rocky-prod-v1-feb2ae08-6h8wb rke2[905139]: 2024/02/06 16:32:12 ERROR: [transport] Client received GoAway with error code ENHANCE_YOUR_CALM and debug data equal to ASCII "too_many_pings".

EDIT: found the problem: pod-eviction-timeout was deprecated in 1.25 (kubernetes/website#39681).

	services:
	kubelet:
	extra_args:
	node-status-update-frequency: 4s
	kube-api:
	extra_args:
	default-not-ready-toleration-seconds: 30
	default-unreachable-toleration-seconds: 30
	kube-controller:
	extra_args:
	node-monitor-period: 2s
	node-monitor-grace-period: 16s
	pod-eviction-timeout: 30s

superseb/cluster.yml

patan32 commented Jan 10, 2024

superseb commented Jan 22, 2024

Zappelphilipp commented Feb 6, 2024 •

edited

Loading

superseb/cluster.yml

patan32 commented Jan 10, 2024

superseb commented Jan 22, 2024

Zappelphilipp commented Feb 6, 2024 • edited Loading

Zappelphilipp commented Feb 6, 2024 •

edited

Loading