Vector Audio Format — Concept Notes

Core Idea

An audio file format that is to PCM what SVG is to bitmap: resolution-independent, mathematically defined, and inherently manipulable. Instead of storing discrete amplitude samples, store mathematical curve definitions (Bézier/B-spline control points) that describe the sound.

Key Insight

Raw PCM waveforms are too complex for efficient curve fitting. But if you first decompose audio via sinusoidal modeling (SMS — Serra & Smith, 1990), the resulting parameter trajectories (frequency, amplitude over time) are smooth, slowly-varying curves — exactly what Bézier curves represent efficiently.

Architecture: Three Track Types

1. Sinusoidal Tracks (tonal content)

Each track = one partial (harmonic or inharmonic).

freq: Bézier curve (Hz over time)
amp: Bézier curve (amplitude over time)
birth / death: start and end time

Handles: sustained notes, vocals, bass, pads, pitched instruments. A piano note might have 30 tracks × ~5 control points each = ~300 floats for a full second (vs. 44,100 PCM samples).

2. Noise Bands (textural/stochastic content)

Bandpass-filtered noise with amplitude envelopes.

freq_low / freq_high: frequency range (Hz)
amp: Bézier curve (amplitude envelope over time)

Alternative: a 2D Bézier surface (frequency × time → amplitude) for continuous spectral envelope modeling.

Handles: breath, bow scrape, snare wires, cymbal wash, consonants in speech.

3. Transient Events (attacks, clicks, impacts)

Short broadband bursts (~1-5ms).

time: when it occurs
shape: Bézier curve (amplitude envelope)
spectrum: Bézier curve (spectral energy distribution)
Or: a tiny PCM snippet (few hundred samples)

Handles: drum stick impact, pick attack, plosives.

Percussion Decomposition Examples

Kick: freq sweep 150→60Hz (sinusoidal) + beater click noise bands 1-5kHz + transient at t=0 Snare: short tone 150-250Hz (sinusoidal) + snare wire noise bands 1-15kHz + transient Hi-hat: no sinusoidal tracks, noise bands 3-18kHz with fast decay + transient Cymbal: inharmonic sinusoidal tracks + broadband noise bands with slow decay + transient

Killer Features (What This Enables)

Time stretching: re-parameterize the Bézier curves to a longer time range. Same control points, same frequencies, no pitch change. No phase vocoder artifacts. Transients stay sharp (they're point events, not stretched).
Pitch shifting: multiply all frequency curves by a constant. Done.
Harmonic editing: boost/suppress/remove individual partials.
Sound morphing: interpolate control points between two sounds.
Resolution independence: render at any sample rate.
Extreme compression: smooth parameter curves compress to very few control points (potentially 200:1-400:1 for simple sounds).
Procedural variation: perturb control points slightly for natural-sounding variation.

Rendering Pipeline

For each output sample at time t:

Evaluate each sinusoidal track's freq(t) and amp(t) Bézier curves
Accumulate phase: phase(t) = 2π ∫ freq(t) dt (closed-form for Bézier integrals)
output += amp(t) * sin(phase(t))
Add noise bands: generate white noise, bandpass filter, shape with amp curve
Add transients at their trigger times
Sum all components

The Format Is Essentially...

An additive synthesizer preset extracted from real audio. The file IS a synth patch. The encoder IS the analysis. The decoder IS the synth. It sits between MIDI (pure instructions, no timbre) and PCM (pure samples, no structure).

Where It Works Best vs. Where PCM Wins

This format wins: instruments, voice, synths, sound effects, game audio — structured sounds
PCM wins: rain, crowd noise, field recordings — unstructured/stochastic sounds
Analogous to SVG (illustrations) vs. PNG (photographs)

Analysis Pipeline (Encoder)

Window the signal into overlapping frames (20-50ms, hop 5-10ms)
FFT each frame → magnitude spectrum
Peak picking → find sinusoidal components (use parabolic interpolation for sub-bin accuracy)
Peak tracking across frames → form continuous sinusoidal tracks (birth/continuation/death)
Resynthesize sinusoidal part, subtract from original → residual
Model residual's spectral envelope per frame
Detect transients (onset detection)
Fit Bézier curves to all parameter trajectories (adaptive: more control points during vibrato/change, fewer during sustain)

Practical Build Path

Single note proof of concept: analyze a piano note with SMS (use Python sms-tools), fit Bézier curves to tracks, resynthesize, A/B test
Time stretch test: re-parameterize curves to 2x, compare to phase vocoder
Residual modeling: spectral envelope as Bézier curves, resynthesize as filtered noise
File format spec: define binary/JSON format for tracks + noise + transients
Percussion test: decompose a drum loop into the three track types
Polyphonic audio: use ML source separation (Demucs) as preprocessing, encode each stem independently

Key Tools & Libraries

sms-tools (Python, Xavier Serra) — full SMS analysis/synthesis implementation
librosa (Python) — STFT, peak picking, onset detection
Loris (C++ with Python bindings) — sinusoidal modeling library
scipy.interpolate — B-spline fitting
scipy.optimize — least-squares curve fitting
Demucs (Meta) — ML source separation for polyphonic preprocessing

Essential Reading

McAulay & Quatieri (1986) — "Speech Analysis/Synthesis Based on a Sinusoidal Representation" (foundational peak tracking)
Serra & Smith (1990) — "Spectral Modeling Synthesis" (deterministic + stochastic decomposition)
Serra PhD thesis (1989) — full treatment, freely available
Driedger & Müller (2016) — "A Review of Time-Scale Modification of Music Signals" (survey of time stretching, good context)
Farin — "Curves and Surfaces for CAGD" (Bézier/B-spline math)
Zölzer — "DAFX: Digital Audio Effects" (spectral modeling, time stretching)

Open Questions

Optimal Bézier degree for parameter trajectories (cubic? quartic?)
Adaptive knot placement strategy — how to decide where to add control points
Perceptual error metric for curve fitting (frequency-weighted? psychoacoustic model?)
Cymbal/complex inharmonic sound quality ceiling
Real-time rendering performance for dense track counts
Could the residual use a different curve-based representation than spectral envelope + noise?

rleroi/vector-audio-idea.md

Select an option

No results found

Select an option

No results found

Vector Audio Format — Concept Notes

Core Idea

Key Insight

Architecture: Three Track Types

1. Sinusoidal Tracks (tonal content)

2. Noise Bands (textural/stochastic content)

3. Transient Events (attacks, clicks, impacts)

Percussion Decomposition Examples

Killer Features (What This Enables)

Rendering Pipeline

The Format Is Essentially...

Where It Works Best vs. Where PCM Wins

Analysis Pipeline (Encoder)

Practical Build Path

Key Tools & Libraries

Essential Reading

Open Questions

rleroi commented Apr 14, 2026

Uh oh!

rleroi/vector-audio-idea.md

Vector Audio Format — Concept Notes

Core Idea

Key Insight

Architecture: Three Track Types

1. Sinusoidal Tracks (tonal content)

2. Noise Bands (textural/stochastic content)

3. Transient Events (attacks, clicks, impacts)

Percussion Decomposition Examples

Killer Features (What This Enables)

Rendering Pipeline

The Format Is Essentially...

Where It Works Best vs. Where PCM Wins

Analysis Pipeline (Encoder)

Practical Build Path

Key Tools & Libraries

Essential Reading

Open Questions

rleroi commented Apr 14, 2026

.vec Format (JSON for PoC, binary later)

Uh oh!