Skip to content

Instantly share code, notes, and snippets.

@jxlwqq
Created February 7, 2025 06:22
Show Gist options
  • Save jxlwqq/b86be764398d1ed39ca49820b025c54c to your computer and use it in GitHub Desktop.
Save jxlwqq/b86be764398d1ed39ca49820b025c54c to your computer and use it in GitHub Desktop.
# Calico
#### 拉取镜像
```shell
nerdctl image pull --namespace=k8s.io quay.io/tigera/operator:v1.30.3
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/cni:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/node:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/kube-controllers:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/csi:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/apiserver:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/pod2daemon-flexvol:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/typha:v3.26.0
nerdctl image pull --namespace=k8s.io m.daocloud.io/docker.io/calico/node-driver-registrar:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/cni:v3.26.0 docker.io/calico/cni:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/node:v3.26.0 docker.io/calico/node:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/kube-controllers:v3.26.0 docker.io/calico/kube-controllers:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/csi:v3.26.0 docker.io/calico/csi:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/apiserver:v3.26.0 docker.io/calico/apiserver:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/pod2daemon-flexvol:v3.26.0 docker.io/calico/pod2daemon-flexvol:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/typha:v3.26.0 docker.io/calico/typha:v3.26.0
nerdctl image tag --namespace=k8s.io m.daocloud.io/docker.io/calico/node-driver-registrar:v3.26.0 docker.io/calico/node-driver-registrar:v3.26.0
```
nerdctl save --namespace=k8s.io quay.io/tigera/operator:v1.30.3 >images/tigera-operator-v1.30.3.tar
nerdctl save --namespace=k8s.io docker.io/calico/cni:v3.26.0 >images/calico-cni-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/node:v3.26.0 >images/calico-node-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/kube-controllers:v3.26.0 >images/calico-kube-controllers-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/csi:v3.26.0 >images/calico-csi-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/apiserver:v3.26.0 >images/calico-apiserver-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/pod2daemon-flexvol:v3.26.0 >images/calico-pod2daemon-flexvol-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/typha:v3.26.0 >images/calico-typha-v3.26.0.tar
nerdctl save --namespace=k8s.io docker.io/calico/node-driver-registrar:v3.26.0 >images/calico-node-driver-registrar-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/tigera-operator-v1.30.3.tar
nerdctl load --namespace=k8s.io <images/calico-cni-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-node-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-kube-controllers-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-csi-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-apiserver-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-pod2daemon-flexvol-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-typha-v3.26.0.tar
nerdctl load --namespace=k8s.io <images/calico-node-driver-registrar-v3.26.0.tar
```
#### 安装 calicoctl
```shell
# node-0
curl -LO https://github.com/projectcalico/calico/releases/download/v3.26.0/calicoctl-linux-amd64
chmod +x ./calicoctl-linux-amd64
cp ./calicoctl-linux-amd64 /usr/local/bin/calicoctl
```
#### 安装 Calico
```shell
kubectl create ns calico-system
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/tigera-operator.yaml
kubectl -n tigera-operator set image deployments/tigera-operator tigera-operator=quay.io/tigera/operator:v1.30.3
```
#### 使用 VXLAN 模式
```shell
curl https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/custom-resources.yaml -O
kubectl apply -f custom-resources.yaml
```
#### 调整 Calico 配置
```shell
# waiting for calico ready
watch kubectl get pods -A -o wide
# 关闭 IPIP 模式
# VXLAN or BGP without encapsulation is supported if using Calico CNI. IPIP (Calico's default encapsulation mode) is not supported. Use the following command to turn off IPIP on the default IP pool.
# https://docs.tigera.io/calico/3.26/getting-started/kubernetes/windows-calico/kubernetes/requirements
kubectl patch felixconfiguration default --type=merge --patch='{"spec":{"ipipEnabled":false}}'
# IPAM 配置的严格关联性设置为 true
# For Linux control nodes using Calico networking, strict affinity must be set to true. This is required to prevent Linux nodes from borrowing IP addresses from Windows nodes:
# https://docs.tigera.io/calico/3.26/getting-started/kubernetes/windows-calico/kubernetes/standard
kubectl patch ipamconfigurations default --type=merge --patch='{"spec": {"strictAffinity": true}}'
# 禁用 BGP
# Ensure that BGP is disabled since you're using VXLAN. If you installed Calico using operator, you can do this by:
# https://docs.tigera.io/calico/3.26/getting-started/kubernetes/windows-calico/quickstart
kubectl patch installation default --type=merge --patch='{"spec": {"calicoNetwork": {"bgp": "Disabled"}}}'
```
https://docs.tigera.io/calico/latest/getting-started/kubernetes/self-managed-onprem/onpremises
@jxlwqq
Copy link
Author

jxlwqq commented Feb 7, 2025

搭建 GPU 异构集群

GPU 资源管理

手动管理

最小安装:

Operator 自动管理

在一个就绪的 k8s 集群中,也可以通过 helm 安装 nvidia gpu operator 来自动管理 gpu 资源:

kubectl create ns gpu-operator
kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
    && helm repo update
helm install --generate-name \
    -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --version=v24.9.2 \
    --set driver.enabled=false

当 driver.enabled=true 时,主机会崩溃。

参考

部署 Meta-Llama-3.1-405B-Instruct(测试)

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: vllm-app
  name: vllm
spec:
  replicas: 2
  selector:
    matchLabels:
      app: vllm-app
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: vllm-app
    spec:
      containers:
      - command:
        - python3
        - -m
        - vllm.entrypoints.openai.api_server
        - --model
        - TheBloke/Mistral-7B-Instruct-v0.2-AWQ
        - --quantization=awq
        - --trust-remote-code
        image: vllm/vllm-openai:latest
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 240
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        name: vllm-openai
        ports:
        - containerPort: 8000
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: 8000
            scheme: HTTP
          initialDelaySeconds: 240
          periodSeconds: 5
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            nvidia.com/gpu: "4"
          requests:
            nvidia.com/gpu: "4"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
      volumes:
      - emptyDir: {}
        name: cache-volume
---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: vllm-app
  name: vllm-openai-svc
spec:
  ports:
  - port: 8000
    protocol: TCP
    targetPort: 8000
  selector:
    app: vllm-app
  type: ClusterIP

部署 Meta-Llama-3.1-405B-Instruct(产线)

部署类型

由于 Meta-Llama-3.1-405B-Instruct 模型的规模较大,无法在单个 GPU 上运行,因此需要在多个 GPU 上运行:

  • LeaderWorkerSet:是用于将一组 Pod 作为部署单元的 API。它旨在解决 AI/ML 推理工作负载的常见部署模式,尤其是多节点推理工作负载,其中 LLM 将被分片并在多个节点上的多个 GPU 设备上运行。
VERSION=v0.5.1
kubectl apply --server-side -f https://github.com/kubernetes-sigs/lws/releases/download/$VERSION/manifests.yaml

部署工具

  • SGLang
  • vLLM
  • SkyPilot
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: vllm
spec:
  replicas: 1
  leaderWorkerTemplate:
    size: 2
    restartPolicy: RecreateGroupOnPodRestart
    leaderTemplate:
      metadata:
        labels:
          role: leader
      spec:
        containers:
          - name: vllm-leader
            image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240821_1034_RC00
            env:
              - name: RAY_CLUSTER_SIZE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.annotations['leaderworkerset.sigs.k8s.io/size']
              - name: HUGGING_FACE_HUB_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: hf-secret
                    key: hf_api_token
            command:
              - sh
              - -c
              - "/workspace/vllm/examples/ray_init.sh leader --ray_cluster_size=$RAY_CLUSTER_SIZE; 
                python3 -m vllm.entrypoints.openai.api_server --port 8080 --model meta-llama/Meta-Llama-3.1-405B-Instruct --tensor-parallel-size 8 --pipeline-parallel-size 2"
            resources:
              limits:
                nvidia.com/gpu: "8"
                memory: 1124Gi
                ephemeral-storage: 800Gi
              requests:
                ephemeral-storage: 800Gi
                cpu: 125
            ports:
              - containerPort: 8080
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 10
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 15Gi
    workerTemplate:
      spec:
        containers:
          - name: vllm-worker
            image: us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20240821_1034_RC00
            command:
              - sh
              - -c
              - "/workspace/vllm/examples/ray_init.sh worker --ray_address=$(LWS_LEADER_ADDRESS)"
            resources:
              limits:
                nvidia.com/gpu: "8"
                memory: 1124Gi
                ephemeral-storage: 800Gi
              requests:
                ephemeral-storage: 800Gi
                cpu: 125
            env:
              - name: HUGGING_FACE_HUB_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: hf-secret
                    key: hf_api_token
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm   
        volumes:
        - name: dshm
          emptyDir:
            medium: Memory
            sizeLimit: 15Gi
---
apiVersion: v1
kind: Service
metadata:
  name: vllm-leader
spec:
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    leaderworkerset.sigs.k8s.io/name: vllm
    role: leader
  type: ClusterIP

参考

部署 DeepSeek-V3(产线)

作为 DeepSeek 官方首推的 llm 引擎,SGLang 针对 DeepSeek 模型专门进行了多项优化,以提高其推理速度。

以下是 SGLang 以 Docker 容器形式 部署 DeepSeek-V3 的示例:

8 x NVIDIA H200 GPUs:

# Pull latest image
# https://hub.docker.com/r/lmsysorg/sglang/tags
docker pull lmsysorg/sglang:latest

# Launch
docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --port 30000

2 x 8 x NVIDIA H20 GPUs:

# node 1
docker run --gpus all \
    --shm-size 32g \
    --network=host \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --name sglang_multinode1 \
    -it \
    --rm \
    --env "HF_TOKEN=$HF_TOKEN" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 192.168.114.10:20000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 40000
# node 2
docker run --gpus all \
    --shm-size 32g \
    --network=host \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --name sglang_multinode2 \
    -it \
    --rm \
    --env "HF_TOKEN=$HF_TOKEN" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr 192.168.114.10:20000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0 --port 40000
apiVersion: leaderworkerset.x-k8s.io/v1
kind: LeaderWorkerSet
metadata:
  name: deepseek-v3
spec:
  replicas: 1
  leaderWorkerTemplate:
    size: 2
    restartPolicy: RecreateGroupOnPodRestart
    leaderTemplate:
      metadata:
        labels:
          role: leader
      spec:
        hostNetwork: true
        hostIPC: true
        containers:
          - name: deepseek-v3-leader
            image: lmsysorg/sglang:latest
            env:
              - name: HUGGING_FACE_HUB_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: hf-secret
                    key: hf_api_token
            command:
              - sh
              - -c
              - "python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr ${LWS_LEADER_ADDRESS}:20000 --nnodes 2 --node-rank 0 --trust-remote-code --host 0.0.0.0 --port 40000"
            resources:
              limits:
                nvidia.com/gpu: "8"
                memory: 1124Gi
                ephemeral-storage: 800Gi
              requests:
                ephemeral-storage: 800Gi
                cpu: 125
            ports:
              - containerPort: 8080
            readinessProbe:
              tcpSocket:
                port: 8080
              initialDelaySeconds: 15
              periodSeconds: 10
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm
        volumes:
          - name: dshm
            emptyDir:
              medium: Memory
              sizeLimit: 32Gi
    workerTemplate:
      spec:
        hostNetwork: true
        hostIPC: true
        containers:
          - name: deepseek-v3-worker
            image: lmsysorg/sglang:latest
            command:
              - sh
              - -c
              - "python3 -m sglang.launch_server --model-path deepseek-ai/DeepSeek-V3 --tp 16 --dist-init-addr ${LWS_LEADER_ADDRESS}:20000 --nnodes 2 --node-rank 1 --trust-remote-code --host 0.0.0.0 --port 40000"
            resources:
              limits:
                nvidia.com/gpu: "8"
                memory: 1124Gi
                ephemeral-storage: 800Gi
              requests:
                ephemeral-storage: 800Gi
                cpu: 125
            env:
              - name: HUGGING_FACE_HUB_TOKEN
                valueFrom:
                  secretKeyRef:
                    name: hf-secret
                    key: hf_api_token
            volumeMounts:
              - mountPath: /dev/shm
                name: dshm
        volumes:
          - name: dshm
            emptyDir:
              medium: Memory
              sizeLimit: 32Gi
---
apiVersion: v1
kind: Service
metadata:
  name: deepseek-v3-leader
spec:
  ports:
    - name: http
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector:
    leaderworkerset.sigs.k8s.io/name: deepseek-v3
    role: leader
  type: ClusterIP

参考

https://github.com/sgl-project/sglang/blob/main/benchmark/deepseek_v3/README.md
https://docs.sglang.ai/references/deepseek.html

@jxlwqq
Copy link
Author

jxlwqq commented Feb 7, 2025

Linux

登陆虚拟机

vagrant ssh node-0
vagrant ssh node-1

设置密码并使用 root 用户

sudo passwd root
su root
cd /vagrant

禁用交换分区和防火墙

  • 禁用交换分区。为了保证 kubelet 正常工作,你必须禁用交换分区。
  • 关闭防火墙
sudo swapoff -a
sudo sed -ri 's/.*swap.*/#&/' /etc/fstab
sudo ufw disable

配置网络

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# 设置所需的 sysctl 参数,参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# 应用 sysctl 参数而不重新启动
sudo sysctl --system

安装容器运行时

使用推荐的 containerd 作为容器运行时,详见:

https://github.com/containerd/containerd/blob/main/docs/getting-started.md

if [ ! -f containerd-1.7.2-linux-amd64.tar.gz ]; then
  curl -LO https://github.com/containerd/containerd/releases/download/v1.7.2/containerd-1.7.2-linux-amd64.tar.gz
fi
tar Cxzvf /usr/local/ containerd-1.7.2-linux-amd64.tar.gz

mkdir -p /usr/local/lib/systemd/system/
curl https://raw.githubusercontent.com/containerd/containerd/main/containerd.service > /usr/local/lib/systemd/system/containerd.service
systemctl daemon-reload
systemctl enable --now containerd

if [ ! -f runc.amd64 ]; then
  curl -LO https://github.com/opencontainers/runc/releases/download/v1.1.7/runc.amd64
fi
install -m 755 runc.amd64 /usr/local/sbin/runc

if [ ! -f cni-plugins-linux-amd64-v1.3.0.tgz ]; then
  curl -LO https://github.com/containernetworking/plugins/releases/download/v1.3.0/cni-plugins-linux-amd64-v1.3.0.tgz
fi
mkdir -p /opt/cni/bin
tar Cxzvf /opt/cni/bin/ cni-plugins-linux-amd64-v1.3.0.tgz


# https://kubernetes.io/zh-cn/docs/setup/production-environment/container-runtimes/#containerd
mkdir -p /etc/containerd/
containerd config default > /etc/containerd/config.toml
# 将 SystemdCgroup 的值修改为 true
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

安装 kubeadm、kubelet 和 kubectl

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl

# 网络不通的话,可以使用阿里云的源
# sudo curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
# echo "deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list

安装 nerdctl,并提前下载镜像

curl -LO https://github.com/containerd/nerdctl/releases/download/v1.4.0/nerdctl-1.4.0-linux-amd64.tar.gz
tar Cxzvvf /usr/local/bin nerdctl-1.4.0-linux-amd64.tar.gz

nerdctl image pull --namespace=k8s.io registry.k8s.io/kube-apiserver:v1.28.2
nerdctl image pull --namespace=k8s.io registry.k8s.io/kube-controller-manager:v1.28.2
nerdctl image pull --namespace=k8s.io registry.k8s.io/kube-scheduler:v1.28.2
nerdctl image pull --namespace=k8s.io registry.k8s.io/kube-proxy:v1.28.2
nerdctl image pull --namespace=k8s.io registry.k8s.io/pause:3.9
nerdctl image pull --namespace=k8s.io registry.k8s.io/etcd:3.5.7-0
nerdctl image pull --namespace=k8s.io registry.k8s.io/coredns/coredns:v1.10.1

nerdctl save --namespace=k8s.io registry.k8s.io/kube-apiserver:v1.28.2 >images/kube-apiserver-v1.28.2.tar
nerdctl save --namespace=k8s.io registry.k8s.io/kube-controller-manager:v1.28.2 >images/kube-controller-manager-v1.28.2.tar
nerdctl save --namespace=k8s.io registry.k8s.io/kube-scheduler:v1.28.2 >images/kube-scheduler-v1.28.2.tar
nerdctl save --namespace=k8s.io registry.k8s.io/kube-proxy:v1.28.2 >images/kube-proxy-v1.28.2.tar
nerdctl save --namespace=k8s.io registry.k8s.io/pause:3.9 >images/pause-3.9.tar
nerdctl save --namespace=k8s.io registry.k8s.io/etcd:3.5.7-0 >images/etcd-3.5.7-0.tar
nerdctl save --namespace=k8s.io registry.k8s.io/coredns/coredns:v1.10.1 >images/coredns-v1.10.1.tar

nerdctl load --namespace=k8s.io <images/kube-apiserver-v1.28.2.tar
nerdctl load --namespace=k8s.io <images/kube-controller-manager-v1.28.2.tar
nerdctl load --namespace=k8s.io <images/kube-proxy-v1.28.2.tar
nerdctl load --namespace=k8s.io <images/kube-scheduler-v1.28.2.tar
nerdctl load --namespace=k8s.io <images/pause-3.9.tar
nerdctl load --namespace=k8s.io <images/coredns-v1.10.1.tar
nerdctl load --namespace=k8s.io <images/etcd-3.5.7-0.tar

kubeadm config images pull --kubernetes-version=1.28.2 # --image-repository=registry.aliyuncs.com/google_containers

初始化 control-plane 节点

仅在 control-plane 节点执行:

sudo kubeadm init \
--apiserver-advertise-address=192.168.205.10 \
--apiserver-cert-extra-sans=192.168.205.10 \
--pod-network-cidr=192.168.0.0/16 \
--service-cidr=10.96.0.0/12 \
--kubernetes-version=1.28.2
# --image-repository=registry.aliyuncs.com/google_containers

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

kubectl get nodes
kubectl get pods -A

加入 node 节点

仅在 node 节点执行:

# kubeadm join 192.168.205.10:6443 --token col3a3.t1tj94nt38f7ixyp --discovery-token-ca-cert-hash sha256:056b35bc8838de9d6899800d7178eea4ce2813dbbbea9fb53f8ebde13d5b7741

# kubeadm token create --print-join-command

修改 node 节点的 kubelet 配置

# node-0
cat <<EOF | sudo tee /etc/default/kubelet
KUBELET_EXTRA_ARGS="--node-ip=192.168.205.10"
EOF
sudo systemctl restart kubelet

# node-1
cat <<EOF | sudo tee /etc/default/kubelet
KUBELET_EXTRA_ARGS="--node-ip=192.168.205.11"
EOF
sudo systemctl restart kubelet

上述操作尽限本地 vagrant 环境,线下/生产环境无需修改。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment