Skip to content

Instantly share code, notes, and snippets.

@arthyn
Created February 16, 2026 18:35
Show Gist options
  • Select an option

  • Save arthyn/1551c587e9089aeda5aa77392cddfea6 to your computer and use it in GitHub Desktop.

Select an option

Save arthyn/1551c587e9089aeda5aa77392cddfea6 to your computer and use it in GitHub Desktop.
OpenWakeWord Integration Plan for voice-ui
# OpenWakeWord Integration Plan for voice-ui
## Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Always Running │
│ ┌─────────────┐ │
│ │ Microphone │──▶ Audio Stream (16khz, 16-bit PCM) │
│ └─────────────┘ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ OpenWakeWord │ (~5% CPU on Pi) │
│ │ (80ms frames) │ │
│ └────────┬─────────┘ │
│ │ score > threshold │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Wake Detected! │──▶ Play chime? │
│ └────────┬─────────┘ │
│ │ │
└─────────────────────────────┼──────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ On Demand │
│ ┌──────────────────┐ │
│ │ Record Command │ (until silence/ │
│ │ (VAD or timeout)│ 2-3 sec max) │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Whisper │ │
│ └────────┬─────────┘ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ Agent/OpenClaw │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
## Implementation Steps
### 1. Train custom wake word model
```bash
# On a Linux box with GPU (or Colab)
# Pick something distinctive: "hey nimbus", "ok computer", etc.
python train.py --training_config my_wakeword.yaml --generate_clips
python train.py --training_config my_wakeword.yaml --augment_clips
python train.py --training_config my_wakeword.yaml --train_model
# Output: my_wakeword.onnx / my_wakeword.tflite
```
### 2. Add wake word listener to voice-ui
```python
import openwakeword
from openwakeword.model import Model
class WakeWordListener:
def __init__(self, model_path, threshold=0.5):
self.model = Model(wakeword_models=[model_path])
self.threshold = threshold
self.audio_buffer = []
def process_frame(self, audio_frame):
"""Called every 80ms with 16khz PCM audio"""
prediction = self.model.predict(audio_frame)
# Check if any wake word triggered
for name, score in prediction.items():
if score > self.threshold:
return True # Wake detected
return False
```
### 3. State machine
```
IDLE ──(wake detected)──▶ LISTENING ──(silence/timeout)──▶ PROCESSING ──▶ IDLE
▲ │
└──────────────────────────────────────────────────────────────┘
```
### 4. Audio pipeline changes
- **Current:** Start recording on trigger
- **New:** Always capture audio in ring buffer, feed to OpenWakeWord
- **On wake:** Keep recording, detect end-of-speech (VAD or silence), then transcribe
## Wake Word Suggestions
For "Nimbus" themed:
- `"hey nimbus"` (3 syllables, distinctive)
- `"ok nimbus"`
- `"nimbus"` (single word, might get more false positives)
General tips:
- 2-3 syllables minimum
- Avoid common words
- Hard consonants help (k, t, p)
## Training Data Requirements
| Data Type | Source | Size |
|-----------|--------|------|
| Room impulse responses | MIT dataset | ~100MB |
| Background noise | Audioset, FMA | ~1-2GB |
| Negative features | ACAV100M (HuggingFace) | ~500MB |
| Validation set | Pre-computed | ~50MB |
## Training Time Estimates (Google Colab T4)
| Step | 1000 samples | 5000 samples |
|------|--------------|--------------|
| Generate clips | ~10 min | ~45 min |
| Augment clips | ~5 min | ~20 min |
| Train model | ~20 min | ~2 hrs |
## Resources
- [OpenWakeWord GitHub](https://github.com/dscripka/openWakeWord)
- [Training Notebook (Colab)](https://colab.research.google.com/github/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb)
- [Pre-trained models](https://github.com/dscripka/openWakeWord/releases)
- [HuggingFace Demo](https://huggingface.co/spaces/davidscripka/openWakeWord)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment