- [Apple to Apple][INC Reference] Gaudi perf issue
- [Expert Consultation][No reference] Sage attention acc issue
- https://arxiv.org/pdf/2505.11594 dealta_s
- [Human Steer]task.md cross-projects workspace:
- ar - vllm - omni
- ct -> llmc -> vllm: quant primitive -> quant model -> inference model
- setup driver (root) -> user_install_cmd.sh https://github.com/yiliu30/torch-xpu-setup
- Tools(Agent): web, vscode, copilot cli, vscode(claude agent)...
- Skills: how to create skills and examples
- Others
- English Coach: https://github.com/tw93/Waza/blob/main/rules/english.md
- ssh remote-node "Hi," https://github.com/BBuf/SGLang-Auto-Driven-SKILLS/blob/main/skills/h100-sglang-diffusion/SKILL.md
Boundary, Steer
mxfp4-decompress task file
Address: vllm-project/compressed-tensors#680
In addition to the unit test, we need an end-to-end verification on the
llm-compressorside:/home/yiliu7/workspace/llm-compressor/experimental/mxfp4/qwen3_mxfp4.py.transformersand run a short generation example. The output must be reasonable, even though it is a small model.Local Dev
compressed-tensor: /home/yiliu7/workspace/venvs/ct/bin/
vllm: /home/yiliu7/workspace/venvs/vllm/bin/vllm
Note:
compressed-tensorsis a standalone repository that provides quantization primitives forllm-compressor.Ref:
llm-compressor:/home/yiliu7/workspace/llm-compressor/compressed-tensors:/home/yiliu7/workspace/compressed-tensorsvllm:/home/yiliu7/workspace/