Goals: Add links that are reasonable and good explanations of how stuff works. No hype and no vendor content if possible. Practical first-hand accounts and experience preferred (super rare at this point).
My own notes from a few months back.
- Survey of LLMS
- Self-attention and transformer networks
- What are embeddings
- The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning (YouTube)
- Catching up on the weird world of LLMS
- Attention is all you Need
- Scaling Laws for Neural Language Models
- BERT
- Language Models are Unsupervised Multi-Task Learners
- Training Language Models to Follow Instructions
- Language Models are Few-Shot Learners
- Why host your own LLM?
- How to train your own LLMs
- Training Compute-Optimal Large Language Models
- Opt-175B Logbook
- The case for GZIP Classifiers and more on nearest neighbors algos
- Meta Recsys Using and extending Word2Vec
- The State of GPT (YouTube)
- What is ChatGPT doing and why does it work
- How is LlamaCPP Possible?
- On Prompt Engineering
- Transformers from Scratch
- Building LLM Applications for Production
- Challenges and Applications of Large Language Models
- All the Hard Stuff Nobody talks about when building products with LLMs
- Scaling Kubernetes to run ChatGPT
- Numbers every LLM Developer should know
Thanks to everyone who added suggestions on Twitter, Mastodon, and Bluesky.
he only 5 links you need to understand the Transformer:
(1) https://youtube.com/watch?v=kCc8FmEb1nY Let's build GPT: from scratch, in code, spelled out by
@karpathy
(2) https://youtube.com/watch?v=iDulhoQ2pro Attention Is All You Need explained by
@ykilcher
(3) https://jalammar.github.io/illustrated-transformer/ Illustrated Transformer by
@jayalammar
(4) https://jaykmody.com/blog/gpt-from-scratch/ GPT in 60 Lines of NumPy by
@jaykmody
(5) https://ig.ft.com/generative-ai/ - Generative AI Visualization by the Financial Times