- Go to https://huggingface.co/papers and click through each of the top 3 upvoted papers. - For each paper: - Record the title, URL and upvotes - Summarise the abstract section - Finally, compile together a summary of all 3 papers, ranked by upvotes

install inference engine

install llama.cpp: go to https://github.com/ggml-org/llama.cpp and follow the instructions of your platform. Alternatively you can use other inference engines of your choice.

install OpenAI compatible layer

install llama-swap: https://github.com/mostlygeek/llama-swap and follow the instruction

download model

Best model which fits in 12GB VRAM to date is https://huggingface.co/prithivMLmods/Ophiuchi-Qwen3-14B-Instruct choose a quantization which fits in the VRAM and still has enough room fo the context. Nanabrowser uses a lot of tokens (>10K).

configuration

llama-swap

"qwen3":

maximus2600

huggingface papers

prompt:

install inference engine

install OpenAI compatible layer

download model

configuration

llama-swap