Build intuition on a topic through an interactive Jupyter notebook lesson.
Usage: /notebook-learning then specify:
- Notebook:
@{notebook}.ipynb(existing or new) - Topic: The concept to teach
- Source: Either paste an excerpt from a research paper, or a link to docs, or a paper itself, or just mention a topic and the scope you'd like it to cover
In the specified notebook, help the user build intuition on the topic. Make the concepts intuitively obvious by building from first principles.
Create a lesson format notebook: start from basic intuitions and build up to complete functions. Do NOT work with big functions. Rather, single cells should build up to a more complete solution. For example, start with intuitions behind each of the different principles.
Introduce the concept and why it matters. For example, if teaching einops, show the most valuable version of an einops call in a real world situation and explain what value it's providing at a high level. Then from there start with the very basic intuition.
The basic intuition doesn't need to be directly linked to the topic obviously. For example if demonstrating nonlinearity and UAT:
- Start with a parabola visualization
- Random noise on the parabola visualization
- Show MSE on the random noise we generated from the parabola
- Give sliders for adjusting a parabola a b c and seeing how it changes
- But in practice we don't know it's a parabola so try doing this with a line
- Line sucks at approximating a parabola, and if we add other lines to it we just get another line with different slope
- Show a relu
- Explain nonlinearity in several intuitive and unexpected ways from first principles because it's an important topic
- Show how with two relus we can reduce the loss greatly
- With multiple relus goes down further
- Introduce the UAT
- Address questions that arise from this (for example, why do we train deep neural nets if one layer deep network is enough to approximate it theoretically)
- Other cool intuitions behind the UAT
An intuitive way to often present concepts is problem based. For example when trying to understand RMSNorm (industry standard in LLMs):
- First talk about what problems we have with nns
- Initialization attempts to fix
- BatchNorm introduced to normalize std and mean across batches
- LayerNorm being batch invariant and works for autoregressive better
- RMSNorm
- Pre-norm and other norm concepts outside of exactly what was asked for (more intuitions or things that will help understand the main concept, not random subjects)
- etc.
This is just an example of how an article can build up — do not overindex on this example. Oftentimes it can even be relevant to mention the SOTA or experimental approaches for the concept briefly (for the above example it would be DeepNorm).
If there are any cool math intuitions or valuable derivations, include those and break them down to the log rules level. For example:
- With RoPE encoding, show the breakdown for
q^T * R(theta2-theta1) * k - For log_softmax, show how it's just adding a constant term to all logits
Use markdown cells to explain things in words or show math breakdowns or intuitive summaries. But for any buildup of intuition, always create cells with adequate comments and print statements where useful. Simulations, interactive elements, and visualizations are always super helpful.
Especially when focusing on an intuition, bias towards cells with a single line of code that can be executed and shown (so sometimes we don't even need print statements). Add markdown in between those. For certain things like visualizations or markdown, don't be afraid to be longer and just add comments. Because that's a different kind of intuition. Balance this since you're an expert teacher.
For any toy examples, take the extra step to make them more realistic. Simple example: if working with embeddings, instead of just saying "here are our embeddings [[0.5 1][2 -1]]" it would be so much more valuable throughout the article to refer to something more real like:
# these are 3 fake embeddings representing the words "The cat on"
embs = torch.randn(3, 2)
print(embs)Don't be afraid to make cells that small or even a single line. We don't need to have many lines of code in each cell.
Throughout the notebook, reuse variables where possible.
If you create a model or architecture demo, make it actually do something:
- Train it on toy data (even 100-200 steps on repetitive text)
- Show the loss decreasing
- Generate output so readers see the model "working"
A CharLSTM that just defines architecture is less valuable than one that trains for 200 steps and generates "hel" → "lo " because readers need to see learning happen, not just believe it could.
For some sections create a cell that's unfinished with a test problem. Use test_ for all the variables within it so it doesn't interfere with the main variables in the notebook. Include:
a) Context: Why this exercise matters (1-2 sentences connecting to the lesson)
b) Some scaffolding with a # fill in code here
c) Some assert statement or comment about how you know if you got it right
Example:
# EXERCISE: Implement a GRU cell
#
# GRU is a simplified LSTM with only 2 gates (reset, update) instead of 3.
# The key insight: by removing the output gate, we reduce parameters while
# keeping most of the gradient flow benefits.
#
# Your task: Fill in the forward pass
class GRUCell:
def __init__(self, input_size, hidden_size):
# ... initialization provided ...
def forward(self, x, h_prev):
# Fill in: compute reset gate, update gate, candidate hidden state
# Hint: reset gate controls how much of h_prev to forget
pass
# Test:
# test_gru = GRUCell(10, 8)
# test_out = test_gru.forward(np.random.randn(1, 10), np.zeros((1, 8)))
# assert test_out.shape == (1, 8), "Output shape should be (1, hidden_size)"Remember that building intuitions isn't always about just showing code and explaining things purely related to the concept. For example if learning about Boltzmann networks, it would be valuable to build up to the energy minimization function not just through markdown but running code cells and visualizations that are not obviously related to the thing we're learning about.
Mention shortcuts after learning the topic. For example, using F.cross_entropy instead of having to rebuild the function from scratch every time.
Regarding mentioning things from similar fields that the reader may not know about (but it isn't worth diving into because it's too different), provide intuitions and a why, and mention what to search to learn more about it.
It helps to build intuition when you can tie in with a fundamental concept from somewhere else that they probably already know. For example for math related concepts, tie into normal distributions, complex numbers, rref, unit circle, and the pythagorean theorem.
Assume that a beginner will question everything you show. So the "why" down to first principles needs to be intuitive. Not just the what.
Use utils.set_seed(42) at the top with imports to set the seed for the entire notebook (covers numpy, pytorch, etc.). Don't use set seed outside of this one call.
If relevant, use torch over numpy whenever possible.
Assume we may want to make edits to the notebook in the future, so don't add any positional markers like "Part x" titles.
Add # | export comments to the top of cells that would be valuable to export into their own library file (for example a sampler Class for training a transformer that puts together all the necessary code).
For complex physical or hardware concepts, images are essential. Don't just describe things - show them. Examples where images help enormously:
- Transistor implementations of logic gates
- CPU architecture diagrams
- Memory hierarchy pyramids
- Pipeline stages
- Neural network architectures
- Attention mechanism visualizations
Image workflow:
- Create an
images/subfolder in the notebook's directory - Reference images as
 - Add fallback text for when images aren't available:
 *(If image not found: search "CMOS inverter circuit" for a visual)*
Finding images:
- Search the web for diagrams from educational resources
- Check Wikipedia Commons for public domain images
- Reference paper figures when discussing research
- Create ASCII diagrams as fallback for simple concepts
ASCII diagrams work well for simple structures:
Input A ──┬──[AND]──┬── Output
│ │
Input B ──┘ │
│
(carry chain continues...)
The goal: a reader should never encounter a complex concept without a visual aid. If you can't find an image, at minimum create an ASCII representation.
Feel free to suggest splitting the notebook into multiple notebooks if:
- The concept is too dense
- A prerequisite topic is both dense enough and different enough from the topic at hand to warrant another notebook
Base this on what you sense the user's current knowledge level is (or if a) explicitly asked, b) you offer it as an option and they say yes).
End with a summary that synthesizes the journey, not just recaps topics covered:
Bad (recap):
## Summary
We covered:
- RNNs and how they work
- Backpropagation through time
- The vanishing gradient problem
- LSTMsGood (synthesis):
## Summary: The Journey from RNN to LSTM
We started with a simple question: **How can neural networks remember across time?**
### The RNN Answer
Use the previous hidden state as additional input. Same weights at every timestep.
### The Problem
Training via BPTT multiplies gradients through each timestep. These products either:
- **Vanish** (factor < 1): Early tokens get no gradient signal
- **Explode** (factor > 1): Gradients blow up
### The LSTM Solution
Add a **cell state** that flows through time via addition (not multiplication!).
### Key Takeaways
1. **Vanishing gradients are exponential** — even 0.9^100 is basically zero
2. **Addition > Multiplication** for gradient flow
3. **Initialize forget bias to 1** — helps at the start of trainingThe synthesis should:
- Restate the core question the notebook answered
- Show the progression: problem → attempt → failure → better solution
- List 3-5 key takeaways with concrete examples
- Connect forward to what comes next
-
Repeat back your high-level plan for what content will be in this notebook.
-
Ask diagnostic questions to tease out what the user's current knowledge level is in this topic (and some super basic tests to respond back with).