wheeze.py
is a tool to help generate lyrics for music, but
in a music-first way. It can be thought of as the reverse of
the typical LLM-assisted approach.
(The first two letters come from OpenAI's whisper. I haven't invented a post-hoc rationale for the rest of the name yet.)
Generating lyrics with an LLM and then forcing the music to comply doesn't always produce a natural-sounding result. If the lyrics and music are generated by two separate networks, there will be square pegs to hammer into round holes.
Udio hallucinates really good vocals that just happen to be nonsense. But any generative AI will output nonsense if you read the data out too early. Udio's hallucinated vocals are just undercooked.
The idea for this started out as me typing out hallucinated vocalizations "as-is" into the lyrics box because that made inpainting perform much better. Giving the inpainting tool a complete map of the lyrics, hallucinations included, helps it find whatever it is you tell it to change.
Transcribing the vocals with software was the obvious thing to try next. I had already noticed remixing improves sloppy vocalizations if the lyrics are filled in, and I figured it should at least do something to full hallucinations, too.
It does do something.
Just a simple feedback loop, really. wheeze.py
aims to
automate the following steps:
- Get the vocals stem from a track. If you have a tool that can transcribe a non-stemmed track that's great, but Whisper will output bizarre things given silence and/or background noise. 1
- Transcribe the vocals with whatever works. Should not skip over input. The result can be pure alphabet soup or real words spelling out nonsense phrases, the only requirement is that it represents the vocalizations.
- Optionally edit the transcription. Stick to original
pacing and pronunciation. (
wheeze.py
uses GPT-4o.) - Remix the track with the new lyrics. Optimal settings vary based on astrological sign.
- Go to back to step 1 with the remixed result as your new source track.
wheeze.py
takes an audio file, calls OpenAI's Whisper
API to transcribe it, generates an edited transcription
with GPT-4o, and finally prints both versions.
Currently there aren't many features. I'll be adding an option to pass genre and thematical hints to GPT-4o for it to consider when editing the transcription. It would also be nice to provide remix knob settings suggestions to go along with the edited lyrics. There's no Udio API (yet?) and clickety-clicking is eating the worms eating my brain.
Footnotes
-
Examples:
- Trademark statement of the OSHO foundation repeated over and over.
["The Comfort of Ministry Blanket"]
repeated over and over.