Skip to content

Instantly share code, notes, and snippets.

@imbgar
Created June 4, 2026 17:12
Show Gist options
  • Select an option

  • Save imbgar/46660c46c8e3a16169cc2b2b59cb7394 to your computer and use it in GitHub Desktop.

Select an option

Save imbgar/46660c46c8e3a16169cc2b2b59cb7394 to your computer and use it in GitHub Desktop.
peaR video-explainer engine

video-explainer

Turn any technical topic into a narrated, animated MP4 — synchronized neural TTS narration over dark-themed animated diagrams, in an educational presentation style.

You write a small JSON config describing a handful of scenes; the engine generates the narration, renders each frame with matplotlib, and assembles a video with ffmpeg.

Prerequisites

  • Python 3.12
  • ffmpeg (for video assembly; without it the engine falls back to an animated GIF)
    • macOS: brew install ffmpeg
    • Debian/Ubuntu: apt-get install ffmpeg
  • Python packages:
    python3.12 -m pip install -r requirements.txt
    Kokoro's English G2P also needs the spaCy small English model:
    python3.12 -m pip install \
      "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"

TTS backends, in order of preference:

  1. Kokoro — local neural TTS (no network needed)
  2. edge-tts — cloud fallback (needs ffmpeg + network)
  3. silent — if neither is available, the video is rendered without audio

Usage

Run from the directory containing generate.py:

python3.12 generate.py example.json

This reads the config, generates narration, renders frames, and writes the MP4 to ./output/<output_name>.mp4 (the output_name comes from the config's meta block).

Options:

# Explicit output path
python3.12 generate.py example.json --out ./output/dns.mp4

# Skip TTS — render a silent video quickly (useful for iterating on visuals)
python3.12 generate.py example.json --no-tts

Output

  • Final video: ./output/<output_name>.mp4
  • Work files (per-scene frames and audio segments): ./output/<output_name>/

Writing a config

See example.json for a complete, runnable example ("How DNS resolves a domain") and SKILL.md for the full scene schema, the seven scene types (title, stack, flow_table, cards, route_table, trace, insights), color keys, and narration/TTS guidelines.

Files

  • generate.py — CLI entry point: TTS, timing, rendering loop, ffmpeg assembly
  • engine.py — rendering primitives and the seven scene renderers
  • example.json — a generic example config
  • SKILL.md — full schema and authoring guide
  • requirements.txt — Python dependencies
"""
video-explainer engine.py
Reusable rendering primitives + scene renderers for 7 scene types.
All renderers share the signature: render_*(ax, i, n, scene_dict)
"""
import math
import numpy as np
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
from matplotlib.patches import FancyBboxPatch
# ── Palette ───────────────────────────────────────────────────────
BG = "#0d1117"; CARD = "#161b22"; BRD = "#30363d"
P = "#7c3aed"; PL = "#a78bfa"
B = "#1d4ed8"; BL = "#60a5fa"
GL = "#34d399"; YL = "#fbbf24"
OL = "#fb923c"; RL = "#f87171"
TX = "#e6edf3"; DM = "#8b949e"
# Color key → (background, foreground)
CMAP = {
"purple": (P, PL),
"blue": (B, BL),
"green": ("#065f46", GL),
"yellow": ("#92400e", YL),
"orange": ("#7c2d12", OL),
"red": ("#7f1d1d", RL),
"deep_purple": ("#6b21a8", "#d8b4fe"),
"teal": ("#0f4c5c", "#67e8f9"),
"pink": ("#831843", "#f9a8d4"),
"gray": (CARD, DM),
}
_CYCLE = ["purple","blue","green","yellow","orange","red","deep_purple","teal"]
def color(key, idx=None):
"""Return (bg, fg) for a named key or auto-cycle index."""
if key and key in CMAP:
return CMAP[key]
if idx is not None:
return CMAP[_CYCLE[idx % len(_CYCLE)]]
return CMAP["gray"]
FPS = 24
DPI = 96
FW, FH = 1280, 720
HOLD_FRM = 60
ANIM_FRAC = 0.60
# ── Primitives ────────────────────────────────────────────────────
def make_fig():
fig = plt.figure(figsize=(FW/DPI, FH/DPI), dpi=DPI)
fig.patch.set_facecolor(BG)
ax = fig.add_axes([0,0,1,1])
ax.set_facecolor(BG); ax.set_xlim(0,16); ax.set_ylim(0,9); ax.axis("off")
return fig, ax
def box(ax, x, y, w, h, fc="none", ec="none", alpha=1.0, r=0.22, lw=2.0, z=2):
ax.add_patch(FancyBboxPatch((x,y), w, h,
boxstyle=f"round,pad=0,rounding_size={r}",
facecolor=fc, edgecolor=ec, alpha=alpha, linewidth=lw, zorder=z))
def txt(ax, x, y, s, sz=11, c=None, ha="center", va="center",
w="normal", z=5, a=1.0, mono=True):
return ax.text(x, y, s, fontsize=sz, color=c or TX,
ha=ha, va=va, fontweight=w, zorder=z, alpha=a,
fontfamily="monospace" if mono else "sans-serif",
wrap=False)
def arr(ax, x1, y1, x2, y2, c=None, lw=2.0, z=4):
ax.annotate("", xy=(x2,y2), xytext=(x1,y1),
arrowprops=dict(arrowstyle="->,head_width=0.18,head_length=0.25",
color=c or DM, lw=lw, connectionstyle="arc3,rad=0",
shrinkA=4, shrinkB=4), zorder=z)
def hline(ax, x1, x2, y, c=BRD, lw=1.0):
ax.plot([x1,x2],[y,y], color=c, lw=lw, zorder=1)
def circ(ax, x, y, r=0.12, c=YL, z=10, a=1.0):
ax.add_patch(plt.Circle((x,y), r, color=c, zorder=z, alpha=a))
def ease_out(t): return 1-(1-max(0,min(1,t)))**3
def ease_io(t): t=max(0,min(1,t)); return t*t*(3-2*t)
def lerp(a,b,t): return a+(b-a)*t
def av(i, n, start, dur=0.16):
"""Alpha for an element. start/dur are fractions of the animation phase."""
tv = min(1.0, (i/n) / ANIM_FRAC)
return ease_out(min(1.0, max(0.0, (tv-start)/dur)))
def scene_header(ax, title, i, n):
a = av(i, n, 0.01, 0.14)
txt(ax, 8, 8.56, title, sz=15, c=TX, w="bold", mono=False, a=a)
hline(ax, 0.5, 15.5, 8.2, c=BRD if a > 0 else BG)
def save_frame(fig, path):
fig.savefig(path, dpi=DPI, facecolor=BG, edgecolor="none")
plt.close(fig)
# ── Scene Renderers ───────────────────────────────────────────────
def render_title(ax, i, n, s):
"""Scene type: title"""
theme_bg, theme_fg = color(s.get("theme_color", "purple"))
for row in range(9):
box(ax, 0, row, 16, 1, fc=theme_bg, alpha=0.02+row*0.004, r=0, lw=0, z=0)
box(ax, 0, 8.55, 16, 0.45, fc=theme_bg, alpha=0.18, r=0, lw=0, z=1)
a = av(i,n, 0.05, 0.20)
subtitle = s.get("subtitle","")
if subtitle and a>0:
bw = max(4.0, len(subtitle)*0.13 + 1.0)
box(ax, 8-bw/2, 6.4, bw, 0.58, fc=theme_bg, alpha=0.22*a, r=0.25, lw=0)
box(ax, 8-bw/2, 6.4, bw, 0.58, fc="none", ec=theme_fg, alpha=a, r=0.25, lw=1.5)
txt(ax, 8, 6.69, subtitle, sz=9.5, c=theme_fg, a=a)
a = av(i,n, 0.20, 0.38)
if a>0:
t = s.get("title","")
sz = max(18, 42-max(0,len(t)-20)*0.9)
txt(ax, 8, 5.35, t, sz=sz, c=TX, w="bold", a=a)
a = av(i,n, 0.54, 0.28)
tagline = s.get("tagline","")
if tagline and a>0:
txt(ax, 8, 4.52, tagline, sz=13, c=DM, a=a, mono=False)
tags = s.get("tags", [])
tag_colors = s.get("tag_colors", [])
a = av(i,n, 0.70, 0.26)
if tags and a>0:
pw = 3.0; sx = 8-(len(tags)*pw+(len(tags)-1)*0.25)/2
for j, tag in enumerate(tags):
c_key = tag_colors[j] if j < len(tag_colors) else _CYCLE[j % len(_CYCLE)]
bg_, fg_ = color(c_key)
px = sx + j*(pw+0.25)
box(ax, px, 3.08, pw, 0.52, fc=bg_, alpha=0.3*a, r=0.2, lw=0)
box(ax, px, 3.08, pw, 0.52, fc="none", ec=fg_, alpha=a, r=0.2, lw=1.5)
txt(ax, px+pw/2, 3.34, tag, sz=9, c=fg_, a=a)
note = s.get("source_note","")
if note:
txt(ax, 8, 0.28, note, sz=7.5, c=DM, a=0.45)
def render_stack(ax, i, n, s):
"""Scene type: stack — vertical list of labeled layers."""
scene_header(ax, s.get("heading",""), i, n)
layers = s.get("layers", [])
ch = min(1.28, 6.8 / max(len(layers),1))
gap = 0.14
total_h = len(layers)*ch + (len(layers)-1)*gap
start_y = (8.1 - total_h) / 2 + total_h - ch + 0.1
for j, layer in enumerate(layers):
bg_, fg_ = color(layer.get("color"), idx=j)
a = av(i, n, j*0.16, 0.20)
if a <= 0: continue
ly = start_y - j*(ch+gap)
box(ax, 0.55, ly, 14.9, ch, fc=bg_, alpha=0.15*a, r=0.28, lw=0)
box(ax, 0.55, ly, 14.9, ch, fc="none", ec=fg_, alpha=a, r=0.28, lw=2)
box(ax, 0.55, ly, 0.74, ch, fc=bg_, alpha=0.55*a, r=0.2, lw=0)
num = layer.get("num", str(j+1))
txt(ax, 0.92, ly+ch/2, num, sz=14, c=TX, w="bold", a=a)
title_ = layer.get("title","")
body_ = layer.get("body","")
meta_ = layer.get("meta","")
if ch > 1.0:
txt(ax, 8.3, ly+ch*0.73, title_, sz=11, c=fg_, w="bold", a=a)
txt(ax, 8.3, ly+ch*0.45, body_, sz=8.5, c=DM, a=a)
else:
txt(ax, 8.3, ly+ch*0.6, title_, sz=10, c=fg_, w="bold", a=a)
txt(ax, 8.3, ly+ch*0.28, body_, sz=8, c=DM, a=a)
if meta_:
txt(ax, 15.0, ly+ch/2, meta_, sz=8.5, c=fg_, a=a*0.85, ha="right", mono=False)
if j < len(layers)-1:
a2 = av(i, n, j*0.16+0.13, 0.10)
if a2>0: arr(ax, 8, ly, 8, ly-gap-0.05, c=BRD, lw=1.5)
def render_flow_table(ax, i, n, s):
"""Scene type: flow_table — horizontal flow nodes + rules table below."""
scene_header(ax, s.get("heading",""), i, n)
nodes = s.get("flow", [])
rows = s.get("rows", [])
cols_h = s.get("table_header", ["Pattern","Target"])
# Flow diagram
n_nodes = len(nodes)
xs = [1.2 + k*(13.6/(max(n_nodes-1,1))) for k in range(n_nodes)]
node_y = 7.3 - (0.25 if rows else 0)
for k, node in enumerate(nodes):
bg_, fg_ = color(node.get("color"), idx=k)
a = av(i, n, 0.04+k*0.12, 0.16)
if a>0:
box(ax, xs[k]-1.15, node_y-0.44, 2.3, 0.88, fc=CARD, ec=fg_, alpha=a, r=0.2, lw=1.5)
txt(ax, xs[k], node_y+0.17, node.get("label",""), sz=9, c=fg_, w="bold", a=a, mono=False)
txt(ax, xs[k], node_y-0.16, node.get("sub",""), sz=7, c=DM, a=a*0.85)
if k>0:
a2 = av(i, n, 0.04+k*0.12, 0.10)
if a2>0: arr(ax, xs[k-1]+1.15, node_y, xs[k]-1.15, node_y, c=fg_, lw=2)
metaphor = s.get("metaphor","")
if metaphor:
a = av(i, n, 0.38, 0.18)
if a>0: txt(ax, 8, node_y-0.8, metaphor, sz=8.5, c=DM, a=a, mono=False)
# Table
if rows:
tbl_start_y = node_y - (1.1 if metaphor else 0.7)
a_h = av(i, n, 0.44, 0.16)
if a_h>0:
hline(ax, 0.4, 15.6, tbl_start_y, c=BRD)
ncols = len(cols_h)
col_xs = [0.4 + (15.2/(ncols+0.5))*(k+0.75) for k in range(ncols)]
for k, ch in enumerate(cols_h):
txt(ax, col_xs[k], tbl_start_y-0.18, ch, sz=8, c=DM, w="bold", a=a_h)
row_h = min(0.70, (tbl_start_y-0.3) / max(len(rows),1))
for j, row in enumerate(rows):
a = av(i, n, 0.50+j*0.08, 0.12)
if a<=0: continue
ry = tbl_start_y - 0.45 - j*row_h
highlight = row.get("highlight", False)
bg_, fg_ = color(row.get("color"), idx=j)
box(ax, 0.4, ry-row_h*0.42, 15.2, row_h*0.84,
fc=bg_, alpha=(0.28 if highlight else 0.09)*a, r=0.13, lw=0)
vals = [row.get(f"col{k+1}","") for k in range(len(cols_h))]
for k, val in enumerate(vals):
col_x = 0.4 + (15.2/(len(cols_h)+0.5))*(k+0.75)
txt(ax, col_x, ry, val, sz=8 if not highlight else 8.5,
c=fg_ if highlight else (DM if k>0 else TX), a=a,
ha="center", w="bold" if highlight else "normal")
if j < len(rows)-1:
ax.plot([9.0],[ry],"->",color=fg_,alpha=a,markersize=7,zorder=5)
def render_cards(ax, i, n, s):
"""Scene type: cards — one featured card + side cards, or grid."""
scene_header(ax, s.get("heading",""), i, n)
code_note = s.get("code_note","")
if code_note:
a = av(i,n,0.02,0.14)
txt(ax, 8, 7.88, code_note, sz=7.8, c=DM, a=a)
cards = s.get("cards", [])
featured = [c for c in cards if c.get("featured")]
others = [c for c in cards if not c.get("featured")]
if featured:
# Layout: big card left, stacked cards right
fc_ = featured[0]
bg_, fg_ = color(fc_.get("color"), idx=0)
a = av(i,n, 0.0, 0.20)
if a>0:
box(ax, 0.30, 0.40, 4.75, 6.55, fc=bg_, alpha=0.17*a, r=0.28, lw=0)
box(ax, 0.30, 0.40, 4.75, 6.55, fc="none", ec=fg_, alpha=a, r=0.28, lw=2)
box(ax, 0.30, 6.47, 4.75, 0.48, fc=bg_, alpha=0.55*a, r=0.2, lw=0)
txt(ax, 2.68, 6.71, fc_.get("title",""), sz=9.5, c=fg_, w="bold", a=a)
mid=3.50
txt(ax, 2.68, mid+0.72, fc_.get("file",""), sz=8, c=DM, a=a)
txt(ax, 2.68, mid+0.18, fc_.get("desc",""), sz=9, c=TX, a=a, mono=False)
txt(ax, 2.68, mid-0.38, fc_.get("specs",""), sz=7.8, c=DM, a=a)
row_h = 6.55 / max(len(others),1)
for j, card in enumerate(others):
bg_, fg_ = color(card.get("color"), idx=j+1)
a = av(i, n, j*0.12, 0.20)
if a<=0: continue
cy = 0.40 + 6.55 - (j+1)*row_h
box(ax, 5.35, cy, 10.25, row_h-0.12, fc=bg_, alpha=0.17*a, r=0.28, lw=0)
box(ax, 5.35, cy, 10.25, row_h-0.12, fc="none", ec=fg_, alpha=a, r=0.28, lw=2)
box(ax, 5.35, cy, 10.25, 0.45, fc=bg_, alpha=0.55*a, r=0.2, lw=0)
txt(ax, 10.5, cy+row_h-0.38, card.get("title",""), sz=9, c=fg_, w="bold", a=a)
m=cy+row_h*0.45
txt(ax, 10.5, m+0.22, card.get("file",""), sz=7.5, c=DM, a=a)
txt(ax, 10.5, m-0.08, card.get("desc",""), sz=8.5, c=TX, a=a, mono=False)
txt(ax, 10.5, m-0.38, card.get("specs",""), sz=7.5, c=DM, a=a)
else:
# Grid layout
nc = min(3, len(cards))
nr = math.ceil(len(cards)/nc)
cw = 14.8/nc - 0.15
ch_ = 6.8/nr - 0.15
for j, card in enumerate(cards):
r_,c_ = divmod(j,nc)
bg_, fg_ = color(card.get("color"), idx=j)
a = av(i, n, j*0.12, 0.20)
if a<=0: continue
cx = 0.55 + c_*(cw+0.15)
cy = 0.55 + (nr-1-r_)*(ch_+0.15)
box(ax, cx, cy, cw, ch_, fc=bg_, alpha=0.17*a, r=0.25, lw=0)
box(ax, cx, cy, cw, ch_, fc="none", ec=fg_, alpha=a, r=0.25, lw=2)
box(ax, cx, cy+ch_-0.44, cw, 0.44, fc=bg_, alpha=0.55*a, r=0.18, lw=0)
txt(ax, cx+cw/2, cy+ch_-0.22, card.get("title",""), sz=9, c=fg_, w="bold", a=a)
m=cy+ch_/2
txt(ax, cx+cw/2, m+0.22, card.get("file",""), sz=7.5, c=DM, a=a)
txt(ax, cx+cw/2, m-0.08, card.get("desc",""), sz=8.5, c=TX, a=a, mono=False)
txt(ax, cx+cw/2, m-0.38, card.get("specs",""), sz=7.5, c=DM, a=a)
def render_route_table(ax, i, n, s):
"""Scene type: route_table — sidebar + method/path/handler table."""
scene_header(ax, s.get("heading",""), i, n)
sidebar = s.get("sidebar", {})
routes = s.get("routes", [])
cols = s.get("columns", ["METHOD","PATH","HANDLER"])
footer = s.get("footer","")
sb_label = sidebar.get("label","")
sb_subs = sidebar.get("sub",[])
a_side = av(i,n, 0.04, 0.18)
if a_side>0 and sb_label:
box(ax, 0.22, 0.72, 2.12, 7.14, fc=P, alpha=0.13*a_side, r=0.28, lw=0)
box(ax, 0.22, 0.72, 2.12, 7.14, fc="none", ec=PL, alpha=a_side, r=0.28, lw=2)
txt(ax, 1.28, 7.50, sb_label, sz=11, c=PL, w="bold", a=a_side)
for k, sub in enumerate(sb_subs):
txt(ax, 1.28, 7.10-k*0.36, sub, sz=8.5, c=DM, a=a_side)
a_hdr = av(i,n, 0.15, 0.16)
if a_hdr>0:
box(ax, 2.55, 7.62, 13.1, 0.46, fc=CARD, ec=BRD, alpha=a_hdr, r=0.15, lw=1)
col_xs = [2.55 + 13.1*(k+0.5)/len(cols) for k in range(len(cols))]
for k, c_lbl in enumerate(cols):
txt(ax, col_xs[k], 7.85, c_lbl, sz=8, c=DM, w="bold", a=a_hdr)
row_h = min(0.70, 6.8/max(len(routes),1))
for j, route in enumerate(routes):
bg_, fg_ = color(route.get("color"), idx=j)
a = av(i, n, 0.22+j*0.07, 0.10)
if a<=0: continue
ry = 7.05 - j*row_h
highlight = route.get("highlight", False)
if sb_label:
ax.plot([2.32, 2.57],[ry+0.06,ry+0.06], color=fg_, lw=1.5, alpha=a*0.5, zorder=3)
box(ax, 2.55, ry-0.20, 13.1, row_h*0.84,
fc=fg_, alpha=(0.24 if highlight else 0.08)*a, r=0.12, lw=0)
vals = [route.get(f"col{k+1}","") for k in range(len(cols))]
# First column gets a badge treatment
box(ax, 2.60, ry-0.16, min(1.6,len(vals[0])*0.09+0.4), 0.35,
fc=fg_, alpha=0.32*a, r=0.09, lw=0)
col_xs = [2.55 + 13.1*(k+0.5)/len(cols) for k in range(len(cols))]
for k, val in enumerate(vals):
txt(ax, col_xs[k], ry+0.02, val, sz=8,
c=fg_ if highlight or k==0 else (TX if k==1 else DM),
a=a, ha="center",
w="bold" if highlight or k==0 else "normal")
if footer:
a_ft = av(i,n, 0.90, 0.10)
txt(ax, 8, 0.36, footer, sz=8.5, c=DM, a=a_ft, mono=False)
def render_trace(ax, i, n, s):
"""Scene type: trace — animated request/data flow through waypoints."""
scene_header(ax, s.get("heading",""), i, n)
wps = s.get("waypoints", [])
bands = s.get("bands", [])
success_label = s.get("success_label","")
chunked = s.get("_wp_chunked", False)
# Compute grid layout
rows_set = sorted(set(w.get("row",0) for w in wps))
cols_set = sorted(set(w.get("col",0) for w in wps))
n_rows = max(rows_set)+1 if rows_set else 1
n_cols = max(cols_set)+1 if cols_set else 1
def wp_xy(w):
r = w.get("row",0); c = w.get("col",0)
y = 7.85 - r*(6.5/max(n_rows-1,1)) if n_rows > 1 else 5.0
# Keep nodes within [1.4, 14.6] so ±1.2 boxes never clip the frame edge
x = 1.4 + c*(13.2/max(n_cols-1,1)) if n_cols > 1 else 8.0
return x, y
# Band stripes
if bands:
band_h = 7.4 / max(len(bands),1)
for k, band in enumerate(bands):
bg_, fg_ = color(band.get("color"), idx=k)
by = 0.55 + (len(bands)-1-k)*band_h
box(ax, 0.2, by, 15.6, band_h, fc=bg_, alpha=0.07, r=0.1, lw=0, z=1)
a_bl = av(i,n, 0.02, 0.10) if chunked else av(i,n, 0.04, 0.18)
if a_bl>0:
for k, band in enumerate(bands):
bg_, fg_ = color(band.get("color"), idx=k)
by = 0.55 + (len(bands)-1-k)*band_h
txt(ax, 15.55, by+band_h/2, band.get("label",""),
sz=7.5, c=fg_, a=a_bl*0.8, ha="right")
frac = i / n
# Static nodes — chunked mode: all visible within first 1.5s (before pre-roll ends)
positions = []
for k, wp in enumerate(wps):
x, y = wp_xy(wp)
positions.append((x,y))
bg_, fg_ = color(wp.get("color"), idx=k)
if chunked:
node_start = 0.005 + k * 0.002
a = ease_out(min(1.0, max(0.0, (frac - node_start) / 0.015)))
else:
node_start = 0.01 + k * 0.006
a = ease_out(min(1.0, max(0.0, (frac - node_start) / 0.05)))
if a>0:
box(ax, x-1.2, y-0.40, 2.4, 0.80, fc=CARD, ec=fg_, alpha=a, r=0.2, lw=1.5)
txt(ax, x, y+0.14, wp.get("label",""), sz=8.5, c=fg_, w="bold", a=a, mono=False)
txt(ax, x, y-0.14, wp.get("sub",""), sz=6.5, c=DM, a=a*0.85)
# Static edges — chunked: fully drawn well before pre-roll ends
if chunked:
a_e = ease_out(min(1.0, max(0.0, (frac - 0.025) / 0.015)))
else:
a_e = ease_out(min(1.0, max(0.0, (frac - 0.07) / 0.04)))
if a_e>0 and len(positions)>1:
for k in range(len(positions)-1):
x1,y1 = positions[k]; x2,y2 = positions[k+1]
bg_,fg_ = color(wps[k].get("color"), idx=k)
dx = x2-x1; dy = y2-y1
shrink = 1.25
arr(ax,
x1+(dx/abs(dx) if dx else 0)*shrink if dx else x1,
y1+(dy/abs(dy) if dy else 0)*shrink if dy else y1,
x2-(dx/abs(dx) if dx else 0)*shrink if dx else x2,
y2-(dy/abs(dy) if dy else 0)*shrink if dy else y2,
c=fg_, lw=1.5)
# Animated dot
wp_fracs = s.get("_wp_fracs")
chunked = s.get("_wp_chunked", False)
if wp_fracs and len(wp_fracs) >= len(positions):
if chunked:
# Dot appears at frac 0.04 (~1.9s) — diagram is fully drawn, pre-roll still playing
# Narrator speaks from wp_fracs[0] onward, dot is already at Browser when she starts
dot_start = 0.04
else:
dot_start = max(0.11, wp_fracs[0])
dot_end = min(0.97, wp_fracs[-1] + (0.04 if chunked else 0.02))
else:
dot_start = 0.12
dot_end = 0.95
chunked = False
if dot_start < frac < dot_end and len(positions) > 1:
edges = len(positions) - 1
if chunked:
# Dwell-and-snap: dot sits at waypoint k for ~82% of its chunk,
# then smoothly snaps to k+1 in the last 18%.
cur = 0
for j in range(len(wp_fracs)):
if wp_fracs[j] <= frac:
cur = j
cur = min(cur, len(positions) - 1)
next_f = wp_fracs[cur+1] if cur+1 < len(wp_fracs) else dot_end
span = max(next_f - wp_fracs[cur], 1e-4)
local = min(1.0, (frac - wp_fracs[cur]) / span)
SNAP = 0.18
if local < (1.0 - SNAP) or cur >= len(positions) - 1:
dx_, dy_ = positions[cur]
else:
t_snap = ease_io((local - (1.0 - SNAP)) / SNAP)
x1,y1 = positions[cur]; x2,y2 = positions[cur+1]
dx_ = lerp(x1,x2,t_snap); dy_ = lerp(y1,y2,t_snap)
for rr,aa in [(0.32,0.08),(0.22,0.15),(0.13,1.0)]:
circ(ax, dx_,dy_, r=rr, c=YL, z=12, a=aa)
circ(ax, positions[cur][0], positions[cur][1], r=0.44, c=YL, z=3, a=0.07)
elif wp_fracs and len(wp_fracs) >= len(positions):
# Continuous sweep with word-position timing
seg = edges - 1
seg_t = 1.0
for k in range(edges):
if frac <= wp_fracs[k+1] or k == edges - 1:
seg = k
span = max(wp_fracs[k+1] - wp_fracs[k], 1e-4)
seg_t = ease_io(min(1.0, max(0.0, (frac - wp_fracs[k]) / span)))
break
x1,y1 = positions[seg]; x2,y2 = positions[seg+1]
dx_ = lerp(x1,x2,seg_t); dy_ = lerp(y1,y2,seg_t)
for rr,aa in [(0.32,0.08),(0.22,0.15),(0.13,1.0)]:
circ(ax, dx_,dy_, r=rr, c=YL, z=12, a=aa)
x2_,y2_ = positions[min(seg+1,len(positions)-1)]
circ(ax, x2_,y2_, r=0.44, c=YL, z=3, a=0.09*seg_t)
else:
dp = (frac - dot_start) / max(dot_end - dot_start, 1e-4)
seg = min(edges - 1, int(dp * edges))
seg_t = ease_io((dp * edges) - seg)
x1,y1 = positions[seg]; x2,y2 = positions[seg+1]
dx_ = lerp(x1,x2,seg_t); dy_ = lerp(y1,y2,seg_t)
for rr,aa in [(0.32,0.08),(0.22,0.15),(0.13,1.0)]:
circ(ax, dx_,dy_, r=rr, c=YL, z=12, a=aa)
x2_,y2_ = positions[min(seg+1,len(positions)-1)]
circ(ax, x2_,y2_, r=0.44, c=YL, z=3, a=0.09*seg_t)
# Success flash — fires after last chunk's narration
success_start = min(0.97, wp_fracs[-1] + (0.03 if chunked else 0.01)) if wp_fracs else 0.93
a_done = ease_out(min(1.0, max(0.0, (frac - success_start) / 0.04)))
if a_done>0 and success_label and positions:
lx,ly = positions[-1]
sw = max(2.8, len(success_label)*0.09+0.6)
box(ax, lx-sw/2, ly-0.56, sw, 0.88, fc=GL, alpha=0.28*a_done, r=0.25, lw=0)
txt(ax, lx, ly, success_label, sz=10, c=GL, w="bold", a=a_done, mono=False)
def render_insights(ax, i, n, s):
"""Scene type: insights — numbered key-takeaway cards."""
scene_header(ax, s.get("heading",""), i, n)
insights = s.get("insights", [])
ch = min(1.26, 6.8/max(len(insights),1))
gap = 0.12
total = len(insights)*ch + (len(insights)-1)*gap
start_y = (8.15-total)/2 + total - ch
for j, ins in enumerate(insights):
bg_, fg_ = color(ins.get("color"), idx=j)
a = av(i, n, j*0.13, 0.18)
if a<=0: continue
cy = start_y - j*(ch+gap)
box(ax, 0.32, cy, 15.36, ch, fc=CARD, ec=fg_, alpha=a, r=0.25, lw=1.5)
box(ax, 0.32, cy, 0.84, ch, fc=fg_, alpha=0.28*a, r=0.2, lw=0)
txt(ax, 0.74, cy+ch/2, ins.get("badge",f"[{j+1}]"), sz=11, c=fg_, w="bold", a=a)
txt(ax, 9.0, cy+ch*0.73, ins.get("title",""), sz=10.5, c=fg_, w="bold",
a=a, ha="center", mono=False)
txt(ax, 9.0, cy+ch*0.30, ins.get("body",""), sz=8.5, c=DM, a=a*0.9,
ha="center", mono=True)
RENDERERS = {
"title": render_title,
"stack": render_stack,
"flow_table": render_flow_table,
"cards": render_cards,
"route_table": render_route_table,
"trace": render_trace,
"insights": render_insights,
}
{
"meta": {
"title": "How DNS Resolves a Domain",
"output_name": "dns_resolution",
"theme": "blue"
},
"scenes": [
{
"type": "title",
"title": "How DNS Resolves a Domain",
"subtitle": "NETWORKING BASICS",
"tagline": "Turning a name into an address",
"theme_color": "blue",
"tags": ["Resolver", "Root", "TLD", "Authoritative"],
"tag_colors": ["blue", "purple", "green", "yellow"],
"source_note": "example config",
"narration": "Every time you visit a website, your computer must first turn a human friendly name, like example dot com, into a numeric address. That translation is the job of the Domain Name System, or D N S. Let us walk through how a single lookup travels across the internet and comes back with an answer."
},
{
"type": "stack",
"heading": "The Four Players in a Lookup",
"layers": [
{
"num": "1",
"title": "Stub Resolver",
"body": "On your device. Asks one question, waits for one answer.",
"meta": "the customer",
"color": "blue"
},
{
"num": "2",
"title": "Recursive Resolver",
"body": "Does the legwork. Caches answers for everyone.",
"meta": "the concierge",
"color": "purple"
},
{
"num": "3",
"title": "Root & TLD Servers",
"body": "Point to who is responsible for .com, .org, and friends.",
"meta": "the directory",
"color": "green"
},
{
"num": "4",
"title": "Authoritative Server",
"body": "Holds the real record for the domain.",
"meta": "the source of truth",
"color": "yellow"
}
],
"narration": "A lookup involves four players. Your device runs a stub resolver, which simply asks a question and waits. It hands that question to a recursive resolver, which does the real legwork and remembers answers for next time. The recursive resolver consults root and top level domain servers to find who is responsible, and finally reaches the authoritative server that holds the true record."
},
{
"type": "flow_table",
"heading": "Which Server Knows What",
"flow": [
{"label": "Root", "sub": "knows the\nTLD servers", "color": "blue"},
{"label": "TLD", "sub": "knows the\nauthoritative", "color": "purple"},
{"label": "Authoritative", "sub": "knows the\naddress", "color": "green"}
],
"metaphor": "Each server does not know the answer, only who to ask next.",
"table_header": ["You ask about", "Server that answers"],
"rows": [
{"col1": ".com, .org, .net", "col2": "Root name servers", "color": "blue"},
{"col1": "example.com", "col2": "TLD name servers", "color": "purple"},
{"col1": "www.example.com", "col2": "Authoritative server", "color": "green"},
{"col1": "anything cached", "col2": "Recursive resolver", "color": "yellow", "highlight": true}
],
"narration": "The clever part is that no single server knows everything. A root server only knows which servers handle dot com. Those top level domain servers only know which server is authoritative for example dot com. And that authoritative server finally knows the real address. If the recursive resolver has seen the answer recently, it skips all of this and replies from its cache."
},
{
"type": "trace",
"heading": "Following One Lookup End to End",
"waypoints": [
{"label": "Your Device", "sub": "stub resolver", "color": "blue", "row": 0, "col": 0, "narration": "It starts on your device. You type example dot com, and the stub resolver sends the question to your configured recursive resolver."},
{"label": "Recursive Resolver", "sub": "cache miss", "color": "purple", "row": 0, "col": 1, "narration": "The recursive resolver checks its cache, finds nothing, and begins asking around on your behalf."},
{"label": "Root Server", "sub": "go ask .com", "color": "green", "row": 0, "col": 2, "narration": "It asks a root server, which replies, I do not know the address, but here are the servers that handle dot com."},
{"label": "TLD Server", "sub": "go ask authoritative", "color": "yellow", "row": 1, "col": 1, "narration": "Next it asks the dot com server, which points it to the authoritative server for example dot com."},
{"label": "Authoritative", "sub": "here is the address", "color": "orange", "row": 1, "col": 2, "narration": "Finally the authoritative server returns the real numeric address, and the recursive resolver caches it and hands it back to you."}
],
"bands": [
{"label": "Row 1: your side", "color": "blue"},
{"label": "Row 2: the source", "color": "orange"}
],
"success_label": "Address resolved",
"narration": "Let us trace one real lookup. It starts on your device. The stub resolver sends the question to your recursive resolver. The resolver checks its cache, finds nothing, and asks a root server, which points it to the dot com servers. Those point it to the authoritative server, which finally returns the real address. The resolver caches the answer and hands it back to you."
},
{
"type": "insights",
"heading": "Key Insights",
"insights": [
{"badge": "[1]", "title": "Names are not addresses", "body": "DNS exists to translate human names\ninto machine routable addresses.", "color": "blue"},
{"badge": "[2]", "title": "Delegation, not omniscience", "body": "Each server only knows who to ask next,\nnot the final answer.", "color": "purple"},
{"badge": "[3]", "title": "Caching makes it fast", "body": "Most lookups never leave the resolver\nbecause answers are remembered.", "color": "green"},
{"badge": "[4]", "title": "Recursion is the workhorse", "body": "The recursive resolver does all the\nlegwork on your behalf.", "color": "yellow"},
{"badge": "[5]", "title": "Authoritative is the truth", "body": "Only the authoritative server holds\nthe real record for a domain.", "color": "orange"}
],
"narration": "Five things to remember. Names are not addresses, and D N S exists to bridge that gap. Servers work by delegation, each one only knowing who to ask next. Caching means most lookups finish instantly. The recursive resolver does the heavy lifting for you. And only the authoritative server holds the real truth about a domain."
}
]
}
#!/usr/bin/env python3
"""
generate.py — Video Explainer CLI
Usage: python3.12 generate.py <config.json> [--out /path/to/output.mp4]
Reads a scene config JSON, generates TTS narration, renders frames,
assembles a narrated MP4. See skill.md for the config schema.
"""
import sys, os, shutil, subprocess, math, json, argparse
from pathlib import Path
import numpy as np
HERE = Path(__file__).parent
# ── Deps ──────────────────────────────────────────────────────────
def _pip(*pkgs):
subprocess.run([sys.executable, "-m", "pip", "install", *pkgs,
"--break-system-packages", "-q"], check=False)
try: import matplotlib
except: _pip("matplotlib"); import matplotlib
try: import soundfile as sf
except: _pip("soundfile"); import soundfile as sf
from engine import (
make_fig, save_frame, RENDERERS,
FPS, DPI, FW, FH, HOLD_FRM, ANIM_FRAC, BG
)
# ── TTS ───────────────────────────────────────────────────────────
_tts_fn = None
_tts_sr = 24_000
def _try_kokoro():
global _tts_fn, _tts_sr
try:
try: from kokoro import KPipeline
except: _pip("kokoro"); from kokoro import KPipeline
pipeline = KPipeline(lang_code='a')
def fn(text, speed=0.84):
chunks = []
for _, _, audio in pipeline(text, voice='af_heart', speed=speed):
chunks.append(np.asarray(audio, dtype=np.float32))
return (np.concatenate(chunks) if chunks else np.zeros(2400, np.float32)), 24_000
_tts_fn, _tts_sr = fn, 24_000
print(" TTS: kokoro (local, af_heart voice)")
return True
except Exception as e:
print(f" kokoro: {e}"); return False
def _try_edge():
global _tts_fn, _tts_sr
try:
try: import edge_tts
except: _pip("edge-tts"); import edge_tts
import asyncio, tempfile
if not shutil.which("ffmpeg"):
print(" edge-tts needs ffmpeg; skipping"); return False
async def _gen(text, rate):
comm = edge_tts.Communicate(text, voice="en-US-GuyNeural", rate=rate)
tmp = tempfile.mktemp(suffix=".mp3")
await comm.save(tmp); return tmp
def fn(text, speed=0.84):
pct = int((speed-1.0)*100)
mp3 = asyncio.run(_gen(text, f"{pct:+d}%"))
wav = mp3.replace(".mp3",".wav")
subprocess.run(["ffmpeg","-i",mp3,"-ar","24000","-ac","1",
wav,"-y","-loglevel","quiet"], check=True)
os.unlink(mp3)
data,sr = sf.read(wav); os.unlink(wav)
if data.ndim>1: data=data.mean(axis=1)
return data.astype(np.float32), sr
_tts_fn, _tts_sr = fn, 24_000
print(" TTS: edge-tts (cloud, en-US-GuyNeural)")
return True
except Exception as e:
print(f" edge-tts: {e}"); return False
def init_tts():
print("Initializing TTS…")
if not _try_kokoro(): _try_edge()
if _tts_fn is None: print(" TTS: none (video will be silent)")
def speak(text, speed=0.84):
if _tts_fn is None:
secs = max(2.0, len(text.split())/2.8)
return np.zeros(int(_tts_sr*secs), np.float32), _tts_sr
try: return _tts_fn(text, speed=speed)
except Exception as e:
print(f" warn: speak() error: {e}")
return np.zeros(int(_tts_sr*2), np.float32), _tts_sr
# ── Waypoint timing heuristic ─────────────────────────────────────
def _fill_nones(fracs, f_start, f_end):
"""Linear interpolation over None entries using surrounding known values."""
n = len(fracs)
anchors = [(-1, f_start)] + [(i,v) for i,v in enumerate(fracs) if v is not None] + [(n, f_end)]
for ai in range(len(anchors)-1):
i0, v0 = anchors[ai]; i1, v1 = anchors[ai+1]
for j in range(max(0, i0+1), min(n, i1)):
if fracs[j] is None:
t = (j-i0) / max(i1-i0, 1)
fracs[j] = v0 + t*(v1-v0)
def _compute_wp_fracs(narration, waypoints, spoken_dur, scene_dur):
"""
Estimate each waypoint's activation time as a fraction of scene_dur.
Uses word position (not character position) as a proxy for time — words
are spoken at roughly constant rate regardless of length, so word position
tracks audio time much more accurately than character position for technical
narration that has many long words.
Waypoints can optionally include a "cue" field (phrase to search for).
"""
text = narration.lower()
words = text.split()
total_words = max(len(words), 1)
fracs = []
for wp in waypoints:
cue = wp.get("cue","").lower().strip()
label = wp.get("label","").lower()
terms = [cue] if cue else [w for w in label.split() if len(w) > 2]
best_frac = None
for term in terms:
idx = text.find(term)
if idx >= 0:
word_pos = len(text[:idx].split())
best_frac = word_pos / total_words
break
if best_frac is not None:
scene_frac = best_frac * (spoken_dur / scene_dur)
else:
scene_frac = None
fracs.append(scene_frac)
_fill_nones(fracs, 0.0, spoken_dur / scene_dur)
return fracs
# ── Audio ──────────────────────────────────────────────────────────
_CHUNKED_PREROLL_S = 2.0 # silence before chunk 0 so diagram can draw first
def _gen_audio_trace_chunked(wps, audio_dir, scene_idx):
"""
Chunked TTS for trace scenes: one TTS call per waypoint.
A pre-roll silence lets the diagram appear before the narrator speaks.
wp_fracs[k] = exact scene fraction when chunk k starts (dot arrives at waypoint k).
"""
gap = np.zeros(int(_tts_sr * 0.22), np.float32) # brief pause between waypoints
preroll = np.zeros(int(_tts_sr * _CHUNKED_PREROLL_S), np.float32)
chunks = [preroll]
c_durs = []
for j, wp in enumerate(wps):
narr = wp.get("narration", "")
print(f" [{j}] {wp.get('label','?')!r}…", flush=True)
audio, _ = speak(narr) if narr else (np.zeros(int(_tts_sr * 0.2), np.float32), _tts_sr)
seg = audio.astype(np.float32)
if j < len(wps) - 1:
seg = np.concatenate([seg, gap])
chunks.append(seg)
c_durs.append(len(seg) / _tts_sr)
tail = np.zeros(int(_tts_sr * 0.55), np.float32)
all_audio = np.concatenate(chunks + [tail])
peak = np.abs(all_audio).max()
if peak > 0.01:
all_audio = all_audio / peak * 0.88
wav_path = audio_dir / f"s{scene_idx:02d}.wav"
sf.write(str(wav_path), all_audio, _tts_sr)
total_dur = len(all_audio) / _tts_sr
scene_dur = frames_for(total_dur) / FPS
# wp_fracs[k] = exact scene frac when chunk k starts (offset by pre-roll)
wp_fracs, t = [], _CHUNKED_PREROLL_S
for d in c_durs:
wp_fracs.append(t / scene_dur)
t += d
print(f" chunked wp_fracs: {[f'{f:.2f}' for f in wp_fracs]}", flush=True)
return wav_path, total_dur, wp_fracs
def gen_audio(scenes, audio_dir):
audio_dir.mkdir(parents=True, exist_ok=True)
results = []
for k, scene in enumerate(scenes):
wav_path = audio_dir / f"s{k:02d}.wav"
print(f" narrating scene {k+1}/{len(scenes)}: {scene.get('type','?')}…", flush=True)
# Chunked trace: per-waypoint narration fields → exact timing
if (scene.get("type") == "trace" and
any(wp.get("narration") for wp in scene.get("waypoints", []))):
wps = scene.get("waypoints", [])
wav_path, dur, wp_fracs = _gen_audio_trace_chunked(wps, audio_dir, k)
scene["_wp_fracs"] = wp_fracs
scene["_wp_chunked"] = True
results.append((wav_path, dur))
print(f" {dur:.1f}s", flush=True)
continue
# Standard single-narration path
narration = scene.get("narration", "")
spoken, sr = speak(narration) if narration else (np.zeros(int(_tts_sr*2)), _tts_sr)
spoken_dur = len(spoken) / sr
tail = np.zeros(int(sr*0.55), np.float32)
samples = np.concatenate([spoken.astype(np.float32), tail])
peak = np.abs(samples).max()
if peak > 0.01: samples = samples/peak*0.88
sf.write(str(wav_path), samples, sr)
dur = len(samples)/sr
if scene.get("type") == "trace" and narration:
scene_dur = frames_for(dur) / FPS
scene["_wp_fracs"] = _compute_wp_fracs(
narration, scene.get("waypoints", []), spoken_dur, scene_dur
)
print(f" wp_fracs: {[f'{f:.2f}' for f in scene['_wp_fracs']]}", flush=True)
results.append((wav_path, dur))
print(f" {dur:.1f}s", flush=True)
return results
def concat_audio(scene_audio, out_path):
all_s = []
for wav_path,_ in scene_audio:
data,sr = sf.read(str(wav_path))
if data.ndim>1: data=data.mean(axis=1)
all_s.append(data.astype(np.float32))
sf.write(str(out_path), np.concatenate(all_s), _tts_sr)
def frames_for(dur_s, min_f=96):
return max(min_f, math.ceil(dur_s*FPS) + HOLD_FRM)
# ── Rendering ──────────────────────────────────────────────────────
def render_scene(scene, n, frame_dir):
scene_type = scene.get("type","title")
renderer = RENDERERS.get(scene_type)
if renderer is None:
print(f" warn: unknown scene type '{scene_type}', skipping")
return []
frame_dir.mkdir(exist_ok=True)
paths = []
for i in range(n):
fig, ax = make_fig()
renderer(ax, i, n, scene)
p = frame_dir / f"f{i:04d}.png"
save_frame(fig, p)
paths.append(p)
return paths
# ── Assembly ───────────────────────────────────────────────────────
def assemble(all_frame_paths, audio_path, out_path):
ffmpeg = shutil.which("ffmpeg")
flat = [str(p.absolute()) for fps in all_frame_paths for p in fps]
print(f" frames: {len(flat)} ({len(flat)/FPS:.1f}s)", flush=True)
if not ffmpeg:
from PIL import Image
print(" ffmpeg not found — saving GIF…")
imgs = [Image.open(f) for f in flat[::2]]
gif = out_path.with_suffix(".gif")
imgs[0].save(str(gif), save_all=True, append_images=imgs[1:],
duration=int(1000/FPS*2), loop=0)
print(f" saved: {gif}"); return gif
concat = out_path.parent / "concat.txt"
with open(concat,"w") as fh:
for f in flat:
fh.write(f"file '{f}'\n")
fh.write(f"duration {1/FPS:.6f}\n")
silent = out_path.with_name("_silent.mp4")
print(" encoding video…", flush=True)
r1 = subprocess.run([
ffmpeg, "-y", "-f","concat", "-safe","0", "-i", str(concat),
"-vf", f"scale={FW}:{FH}:force_original_aspect_ratio=decrease,"
f"pad={FW}:{FH}:(ow-iw)/2:(oh-ih)/2:color=0d1117",
"-c:v","libx264", "-preset","fast", "-crf","18",
"-pix_fmt","yuv420p", str(silent)
], capture_output=True, text=True)
if r1.returncode != 0:
print("video encode failed:", r1.stderr[-1000:]); return None
if audio_path and Path(audio_path).exists():
print(" muxing audio…", flush=True)
r2 = subprocess.run([
ffmpeg, "-y",
"-i", str(silent), "-i", str(audio_path),
"-c:v","copy", "-c:a","aac", "-b:a","192k",
"-shortest", "-movflags","+faststart", str(out_path)
], capture_output=True, text=True)
if r2.returncode != 0:
print("audio mux failed:", r2.stderr[-500:])
out_path = silent
else:
out_path = silent
concat.unlink(missing_ok=True)
silent.unlink(missing_ok=True) if silent.exists() and out_path != silent else None
print(f" saved: {out_path}", flush=True)
return out_path
# ── Main ───────────────────────────────────────────────────────────
def main():
parser = argparse.ArgumentParser(description="Generate a narrated explainer video from JSON config.")
parser.add_argument("config", help="Path to scene config JSON")
parser.add_argument("--out", help="Output MP4 path (default: output/<name>.mp4)")
parser.add_argument("--no-tts", action="store_true", help="Skip TTS, silent video")
args = parser.parse_args()
config_path = Path(args.config).resolve()
if not config_path.exists():
print(f"Config not found: {config_path}"); sys.exit(1)
with open(config_path) as f:
config = json.load(f)
meta = config.get("meta", {})
scenes = config.get("scenes", [])
name = meta.get("output_name", config_path.stem)
work_dir = HERE / "output" / name
work_dir.mkdir(parents=True, exist_ok=True)
frames_root = work_dir / "frames"
frames_root.mkdir(exist_ok=True)
audio_dir = work_dir / "audio"
out_path = Path(args.out) if args.out else HERE / "output" / f"{name}.mp4"
print(f"\nVideo Explainer: {meta.get('title', name)}")
print(f"Scenes: {len(scenes)}\n")
# TTS
if not args.no_tts:
init_tts()
print(f"\nGenerating narration ({len(scenes)} segments)…")
scene_audio = gen_audio(scenes, audio_dir)
frame_counts = [frames_for(dur) for _,dur in scene_audio]
audio_track = work_dir / "narration.wav"
concat_audio(scene_audio, audio_track)
else:
frame_counts = [frames_for(10.0)] * len(scenes)
audio_track = None
total = sum(frame_counts)
print(f"\nFrame counts: {frame_counts}")
print(f"Total: {total} frames ({total/FPS:.1f}s)\n")
# Render
all_paths = []
for k, (scene, n) in enumerate(zip(scenes, frame_counts)):
sid = f"s{k:02d}_{scene.get('type','?')}"
print(f" rendering {k+1}/{len(scenes)}: {sid} ({n} frames, {n/FPS:.1f}s)…", flush=True)
fdir = frames_root / f"s{k:02d}"
all_paths.append(render_scene(scene, n, fdir))
# Assemble
print("\nAssembling…")
out = assemble(all_paths, audio_track, out_path)
if out:
subprocess.run(["open", str(out)], check=False)
print(f"\nDone: {out}")
if __name__ == "__main__":
main()
# Python dependencies for video-explainer
# Install with: python3.12 -m pip install -r requirements.txt
#
# System dependency (NOT pip-installable): ffmpeg
# macOS: brew install ffmpeg
# Debian/Ubuntu: apt-get install ffmpeg
# Without ffmpeg the engine falls back to producing an animated GIF.
numpy>=1.24
matplotlib>=3.7
soundfile>=0.12
# TTS (primary): Kokoro local neural TTS
kokoro>=0.3
misaki>=0.6
misaki[en]>=0.6
# Kokoro/misaki English G2P also needs the spaCy small English model:
# python3.12 -m pip install \
# "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"
# TTS (fallback): cloud TTS via Microsoft Edge voices
edge-tts>=6.1

Skill: video-explainer

Explain any technical topic as a narrated animated video. Produces a ~2-4 minute MP4 with synchronized Kokoro TTS narration, animated diagrams, and a dark-themed educational presentation style.

How to invoke

User says something like:

  • "make a video explaining X"
  • "explain Y as a video"
  • "video explainer for Z"

Your workflow

  1. Understand the topic — ask clarifying questions if the scope is unclear. Identify: what is being explained, who the audience is, and what the key insight is.

  2. Design 7 scenes using the scene types below. The canonical arc is:

    • Scene 1: title — hook + framing
    • Scene 2: stack — high-level mental model / layers
    • Scene 3: flow_table — the main rule/routing/configuration mechanism
    • Scene 4: cards — the components/services/functions involved
    • Scene 5: route_table — the internal API / decision tree / mapping
    • Scene 6: trace — animate a single concrete example end-to-end
    • Scene 7: insights — 5 key takeaways

    Adapt scene types to the topic. Not all topics need all 7; fewer is fine.

  3. Write narration per scene. TTS rules:

    • Expand all abbreviations: "API" → "A P I", "DNS" → "D N S"
    • Expand punctuation read aloud: "." in domain → "dot", "/" → "slash"
    • Expand numbers: "v2" → "v2", "Gen2" → "Gen Two", "4GiB" → "four gigabytes"
    • Use "Let us" not "Let's" (TTS handles contractions poorly)
    • Target 40-70 words per scene. Presenter pace, not rushing.
    • Each scene narration should stand alone — assume the viewer can read the slide
  4. Generate the config JSON following the schema below.

  5. Run the generator (from the directory containing generate.py):

    python3.12 generate.py /path/to/config.json --out ./output/my_topic.mp4

    The --out flag is optional; by default the video lands in ./output/<output_name>.mp4.

  6. Report the output path. If the video needs adjustments, edit the config and rerun.


Config JSON Schema

{
  "meta": {
    "title": "Human-readable title",
    "output_name": "snake_case_filename",   // used for work dir + default output
    "theme": "purple"                       // informational only
  },
  "scenes": [ /* array of scene objects */ ]
}

Scene types

title — Opening title card

{
  "type": "title",
  "title": "Main large text (shown at ~38pt)",
  "subtitle": "Small tag above the title",       // optional pill badge
  "tagline": "Line below the title",             // optional
  "theme_color": "purple",                       // color key for background tint
  "tags": ["Tag 1", "Tag 2", "Tag 3", "Tag 4"], // animated pill badges (max 4)
  "tag_colors": ["blue","purple","green","yellow"], // optional, auto-assigned if omitted
  "source_note": "source: file.js",             // optional small text at bottom
  "narration": "..."
}

stack — Vertical list of layers (protocols, architecture tiers, pipeline stages)

{
  "type": "stack",
  "heading": "Scene heading",
  "layers": [
    {
      "num": "1",           // badge text (default: auto-numbered)
      "title": "Layer name",
      "body": "One-line description or code snippet",
      "meta": "Metaphor / right-aligned note",  // optional
      "color": "blue"       // color key, auto-assigned if omitted
    }
    // 3-6 layers work best
  ],
  "narration": "..."
}

flow_table — Horizontal flow diagram + rules/routing table below

{
  "type": "flow_table",
  "heading": "Scene heading",
  "flow": [
    {"label": "Node label", "sub": "Two-line\ndetail", "color": "blue"}
    // 2-4 nodes work best
  ],
  "metaphor": "Analogy sentence shown between flow and table",  // optional
  "table_header": ["Column A", "Column B"],
  "rows": [
    {
      "col1": "Pattern / key",
      "col2": "Target / value",
      "color": "purple",      // auto-assigned if omitted
      "highlight": false      // true = bold colored row (use for catch-all / default)
    }
    // 3-7 rows work best
  ],
  "narration": "..."
}

cards — Components / services / functions (featured layout or grid)

{
  "type": "cards",
  "heading": "Scene heading",
  "code_note": "Optional single line of code shown at top",  // optional
  "cards": [
    {
      "featured": true,            // ONE card can be featured (left-side big card)
      "title": "Component name",
      "file": "path/to/file.js",
      "desc": "What it does (plain English)",
      "specs": "2 CPU  |  2 GiB  |  max 100",  // optional specs line
      "color": "purple"
    },
    {
      "title": "Side component",
      "file": "other.js",
      "desc": "What it does",
      "specs": "...",
      "color": "yellow"
    }
    // With one featured: up to 5 side cards work
    // Without featured: up to 6 cards in grid layout
  ],
  "narration": "..."
}

route_table — Sidebar + method/path/handler routing table

{
  "type": "route_table",
  "heading": "Scene heading",
  "sidebar": {
    "label": "filename.js",       // optional; omit for no sidebar
    "sub": ["Express", "app"]     // optional sub-labels
  },
  "columns": ["METHOD", "PATH", "HANDLER"],   // column header labels (2 or 3)
  "routes": [
    {
      "col1": "GET",
      "col2": "/path/:param",
      "col3": "handlerFunction()",
      "color": "blue",
      "highlight": false    // true = bold colored row (use for catch-all / error)
    }
    // 5-10 rows work best
  ],
  "footer": "Optional small note at bottom",  // optional
  "narration": "..."
}

trace — Animated dot following a request/data flow

{
  "type": "trace",
  "heading": "Scene heading",
  "waypoints": [
    {
      "label": "Node label",
      "sub": "Detail text",
      "color": "blue",
      "row": 0,    // 0 = top row
      "col": 0,    // 0 = left column
      "cue": "phrase from narration"  // optional: word/phrase that cues this waypoint in the narration
    }
    // Engine auto-positions from (row, col). Use 3-5 rows, 3-5 cols.
    // Edges connect waypoints in array order.
    // Typical pattern: row 0 = L1, row 1 = L2, row 2 = L3, row 3 = L4
    // "cue": exact word/phrase from narration text that marks when this node is reached.
    //   Used for audio-synchronized dot timing. Falls back to label words if omitted.
    // Optionally add a per-waypoint "narration" field to enable chunked TTS, which
    // produces exact per-waypoint dot timing (one TTS call per waypoint).
  ],
  "bands": [
    {"label": "Layer 1: ...", "color": "blue"}
    // One band per row. Auto-height computed from number of bands.
  ],
  "success_label": "Success message on final node",
  "narration": "..."
}

insights — Numbered key-takeaway cards (always the final scene)

{
  "type": "insights",
  "heading": "Key Insights",
  "insights": [
    {
      "badge": "[1]",
      "title": "Short bold statement",
      "body": "Two-line explanation\nSecond line here",
      "color": "purple"
    }
    // 4-6 insights work best
  ],
  "narration": "..."
}

Color keys

Key Background Foreground
purple #7c3aed #a78bfa
blue #1d4ed8 #60a5fa
green #065f46 #34d399
yellow #92400e #fbbf24
orange #7c2d12 #fb923c
red #7f1d1d #f87171
deep_purple #6b21a8 #d8b4fe
teal #0f4c5c #67e8f9
pink #831843 #f9a8d4
gray #161b22 #8b949e

If color is omitted, the engine cycles through: purple → blue → green → yellow → orange → red → deep_purple → teal


Dependencies

# Python 3.12
python3.12 -m pip install -r requirements.txt
# or, explicitly:
python3.12 -m pip install kokoro soundfile matplotlib numpy
python3.12 -m pip install misaki "misaki[en]"
python3.12 -m pip install "https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl"
# ffmpeg (for video assembly)
brew install ffmpeg      # macOS
# apt-get install ffmpeg # Debian/Ubuntu

If Kokoro is unavailable, the engine falls back to edge-tts (cloud), and if no TTS backend is available it produces a silent video.

Example

See example.json — a small generic config that explains "How DNS resolves a domain". Use it as a reference for JSON structure and narration style.

Output

Videos are saved to ./output/<output_name>.mp4 (relative to generate.py). Work files (frames, audio segments) are in ./output/<output_name>/.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment