- Go to https://huggingface.co/papers and click through each of the top 3 upvoted papers.
- For each paper:
- Record the title, URL and upvotes
- Summarise the abstract section
- Finally, compile together a summary of all 3 papers, ranked by upvotes
install llama.cpp: go to https://github.com/ggml-org/llama.cpp and follow the instructions of your platform. Alternatively you can use other inference engines of your choice.
install llama-swap: https://github.com/mostlygeek/llama-swap and follow the instruction
Best model which fits in 12GB VRAM to date is https://huggingface.co/prithivMLmods/Ophiuchi-Qwen3-14B-Instruct choose a quantization which fits in the VRAM and still has enough room fo the context. Nanabrowser uses a lot of tokens (>10K).
"qwen3":