Skip to content

Instantly share code, notes, and snippets.

@iakashpaul
Created September 11, 2024 18:40
Show Gist options
  • Save iakashpaul/86a3fe9abc726661bebaca927d9f5794 to your computer and use it in GitHub Desktop.
Save iakashpaul/86a3fe9abc726661bebaca927d9f5794 to your computer and use it in GitHub Desktop.
sglang_launch_server

Run SGLang with model weights being stored on persistent storage on /data at FP8 Quantization level, roughly 128tok/s for BS_1

HF_HOME=/data python3 -m sglang.launch_server --model NousResearch/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0 --random-seed 1337 --dtype bfloat16 --quantization fp8 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment