Created
February 16, 2026 18:35
-
-
Save arthyn/1551c587e9089aeda5aa77392cddfea6 to your computer and use it in GitHub Desktop.
OpenWakeWord Integration Plan for voice-ui
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| # OpenWakeWord Integration Plan for voice-ui | |
| ## Architecture | |
| ``` | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ Always Running │ | |
| │ ┌─────────────┐ │ | |
| │ │ Microphone │──▶ Audio Stream (16khz, 16-bit PCM) │ | |
| │ └─────────────┘ │ │ | |
| │ ▼ │ | |
| │ ┌──────────────────┐ │ | |
| │ │ OpenWakeWord │ (~5% CPU on Pi) │ | |
| │ │ (80ms frames) │ │ | |
| │ └────────┬─────────┘ │ | |
| │ │ score > threshold │ | |
| │ ▼ │ | |
| │ ┌──────────────────┐ │ | |
| │ │ Wake Detected! │──▶ Play chime? │ | |
| │ └────────┬─────────┘ │ | |
| │ │ │ | |
| └─────────────────────────────┼──────────────────────────────┘ | |
| ▼ | |
| ┌─────────────────────────────────────────────────────────────┐ | |
| │ On Demand │ | |
| │ ┌──────────────────┐ │ | |
| │ │ Record Command │ (until silence/ │ | |
| │ │ (VAD or timeout)│ 2-3 sec max) │ | |
| │ └────────┬─────────┘ │ | |
| │ ▼ │ | |
| │ ┌──────────────────┐ │ | |
| │ │ Whisper │ │ | |
| │ └────────┬─────────┘ │ | |
| │ ▼ │ | |
| │ ┌──────────────────┐ │ | |
| │ │ Agent/OpenClaw │ │ | |
| │ └──────────────────┘ │ | |
| └─────────────────────────────────────────────────────────────┘ | |
| ``` | |
| ## Implementation Steps | |
| ### 1. Train custom wake word model | |
| ```bash | |
| # On a Linux box with GPU (or Colab) | |
| # Pick something distinctive: "hey nimbus", "ok computer", etc. | |
| python train.py --training_config my_wakeword.yaml --generate_clips | |
| python train.py --training_config my_wakeword.yaml --augment_clips | |
| python train.py --training_config my_wakeword.yaml --train_model | |
| # Output: my_wakeword.onnx / my_wakeword.tflite | |
| ``` | |
| ### 2. Add wake word listener to voice-ui | |
| ```python | |
| import openwakeword | |
| from openwakeword.model import Model | |
| class WakeWordListener: | |
| def __init__(self, model_path, threshold=0.5): | |
| self.model = Model(wakeword_models=[model_path]) | |
| self.threshold = threshold | |
| self.audio_buffer = [] | |
| def process_frame(self, audio_frame): | |
| """Called every 80ms with 16khz PCM audio""" | |
| prediction = self.model.predict(audio_frame) | |
| # Check if any wake word triggered | |
| for name, score in prediction.items(): | |
| if score > self.threshold: | |
| return True # Wake detected | |
| return False | |
| ``` | |
| ### 3. State machine | |
| ``` | |
| IDLE ──(wake detected)──▶ LISTENING ──(silence/timeout)──▶ PROCESSING ──▶ IDLE | |
| ▲ │ | |
| └──────────────────────────────────────────────────────────────┘ | |
| ``` | |
| ### 4. Audio pipeline changes | |
| - **Current:** Start recording on trigger | |
| - **New:** Always capture audio in ring buffer, feed to OpenWakeWord | |
| - **On wake:** Keep recording, detect end-of-speech (VAD or silence), then transcribe | |
| ## Wake Word Suggestions | |
| For "Nimbus" themed: | |
| - `"hey nimbus"` (3 syllables, distinctive) | |
| - `"ok nimbus"` | |
| - `"nimbus"` (single word, might get more false positives) | |
| General tips: | |
| - 2-3 syllables minimum | |
| - Avoid common words | |
| - Hard consonants help (k, t, p) | |
| ## Training Data Requirements | |
| | Data Type | Source | Size | | |
| |-----------|--------|------| | |
| | Room impulse responses | MIT dataset | ~100MB | | |
| | Background noise | Audioset, FMA | ~1-2GB | | |
| | Negative features | ACAV100M (HuggingFace) | ~500MB | | |
| | Validation set | Pre-computed | ~50MB | | |
| ## Training Time Estimates (Google Colab T4) | |
| | Step | 1000 samples | 5000 samples | | |
| |------|--------------|--------------| | |
| | Generate clips | ~10 min | ~45 min | | |
| | Augment clips | ~5 min | ~20 min | | |
| | Train model | ~20 min | ~2 hrs | | |
| ## Resources | |
| - [OpenWakeWord GitHub](https://github.com/dscripka/openWakeWord) | |
| - [Training Notebook (Colab)](https://colab.research.google.com/github/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb) | |
| - [Pre-trained models](https://github.com/dscripka/openWakeWord/releases) | |
| - [HuggingFace Demo](https://huggingface.co/spaces/davidscripka/openWakeWord) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment