Skip to content

Instantly share code, notes, and snippets.

@blues-man
Created March 7, 2025 09:35
Show Gist options
  • Save blues-man/75c9e2967b9248af4f3c8854a0ae1cf3 to your computer and use it in GitHub Desktop.
Save blues-man/75c9e2967b9248af4f3c8854a0ae1cf3 to your computer and use it in GitHub Desktop.
vLLM on OpenShift
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: vllm-models
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 50Gi
---
apiVersion: v1
kind: Secret
metadata:
name: hf-token-secret
type: Opaque
data:
token: "CHANGE_ME"
---
kind: Deployment
apiVersion: apps/v1
metadata:
name: vllm-server
spec:
replicas: 1
selector:
matchLabels:
app: vllm
template:
labels:
app: vllm
spec:
volumes:
- name: llama-storage
persistentVolumeClaim:
claimName: vllm-models
containers:
- name: llama-stack
command:
- bash
- '-c'
- |
MODEL="meta-llama/Llama-3.2-1B-Instruct"
MODEL_PATH=/app/model/$(basename $MODEL)
huggingface-cli login --token $HUGGING_FACE_HUB_TOKEN
huggingface-cli download $MODEL --local-dir $MODEL_PATH --cache-dir $MODEL_PATH
python3 -m vllm.entrypoints.openai.api_server --model $MODEL_PATH --served-model-name $MODEL --port 8000
env:
- name: HUGGING_FACE_HUB_TOKEN
valueFrom:
secretKeyRef:
name: hf-token-secret
key: token
ports:
- containerPort: 8000
protocol: TCP
imagePullPolicy: Always
volumeMounts:
- name: llama-storage
mountPath: /app/model
image: 'quay.io/bluesman/vllm-cpu-env:latest'
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 25%
maxSurge: 25%
revisionHistoryLimit: 10
progressDeadlineSeconds: 600
---
apiVersion: v1
kind: Service
metadata:
name: vllm-server
spec:
selector:
app: vllm
ports:
- port: 8000
targetPort: 8000
type: ClusterIP
---
kind: Route
apiVersion: route.openshift.io/v1
metadata:
name: vllm
spec:
to:
kind: Service
name: vllm-server
weight: 100
port:
targetPort: 8000
tls:
termination: edge
wildcardPolicy: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment