Run SGLang with model weights being stored on persistent storage on /data at FP8 Quantization level, roughly 128tok/s for BS_1
HF_HOME=/data python3 -m sglang.launch_server --model NousResearch/Meta-Llama-3.1-8B-Instruct --host 0.0.0.0 --random-seed 1337 --dtype bfloat16 --quantization fp8