arthyn · February 16, 2026 18:35
diff --git a/gistfile0.txt b/gistfile0.txt
 # OpenWakeWord Integration Plan for voice-ui

 ## Architecture

 ```
 ┌─────────────────────────────────────────────────────────────┐
 │                    Always Running                           │
 │  ┌─────────────┐                                           │
 │  │ Microphone  │──▶ Audio Stream (16khz, 16-bit PCM)       │
 │  └─────────────┘              │                            │
 │                               ▼                            │
 │                    ┌──────────────────┐                    │
 │                    │  OpenWakeWord    │ (~5% CPU on Pi)    │
 │                    │  (80ms frames)   │                    │
 │                    └────────┬─────────┘                    │
 │                             │ score > threshold            │
 │                             ▼                              │
 │                    ┌──────────────────┐                    │
 │                    │  Wake Detected!  │──▶ Play chime?     │
 │                    └────────┬─────────┘                    │
 │                             │                              │
 └─────────────────────────────┼──────────────────────────────┘
                              ▼
 ┌─────────────────────────────────────────────────────────────┐
 │                    On Demand                                │
 │                    ┌──────────────────┐                    │
 │                    │  Record Command  │ (until silence/    │
 │                    │  (VAD or timeout)│  2-3 sec max)      │
 │                    └────────┬─────────┘                    │
 │                             ▼                              │
 │                    ┌──────────────────┐                    │
 │                    │     Whisper      │                    │
 │                    └────────┬─────────┘                    │
 │                             ▼                              │
 │                    ┌──────────────────┐                    │
 │                    │   Agent/OpenClaw │                    │
 │                    └──────────────────┘                    │
 └─────────────────────────────────────────────────────────────┘
 ```

 ## Implementation Steps

 ### 1. Train custom wake word model

 ```bash
 # On a Linux box with GPU (or Colab)
 # Pick something distinctive: "hey nimbus", "ok computer", etc.
 python train.py --training_config my_wakeword.yaml --generate_clips
 python train.py --training_config my_wakeword.yaml --augment_clips  
 python train.py --training_config my_wakeword.yaml --train_model
 # Output: my_wakeword.onnx / my_wakeword.tflite
 ```

 ### 2. Add wake word listener to voice-ui

 ```python
 import openwakeword
 from openwakeword.model import Model

 class WakeWordListener:
    def __init__(self, model_path, threshold=0.5):
        self.model = Model(wakeword_models=[model_path])
        self.threshold = threshold
        self.audio_buffer = []
        
    def process_frame(self, audio_frame):
        """Called every 80ms with 16khz PCM audio"""
        prediction = self.model.predict(audio_frame)
        
        # Check if any wake word triggered
        for name, score in prediction.items():
            if score > self.threshold:
                return True  # Wake detected
        return False
 ```

 ### 3. State machine

 ```
 IDLE ──(wake detected)──▶ LISTENING ──(silence/timeout)──▶ PROCESSING ──▶ IDLE
  ▲                                                              │
  └──────────────────────────────────────────────────────────────┘
 ```

 ### 4. Audio pipeline changes

 - **Current:** Start recording on trigger
 - **New:** Always capture audio in ring buffer, feed to OpenWakeWord
 - **On wake:** Keep recording, detect end-of-speech (VAD or silence), then transcribe

 ## Wake Word Suggestions

 For "Nimbus" themed:
 - `"hey nimbus"` (3 syllables, distinctive)
 - `"ok nimbus"`
 - `"nimbus"` (single word, might get more false positives)

 General tips:
 - 2-3 syllables minimum
 - Avoid common words
 - Hard consonants help (k, t, p)

 ## Training Data Requirements

 | Data Type | Source | Size |
 |-----------|--------|------|
 | Room impulse responses | MIT dataset | ~100MB |
 | Background noise | Audioset, FMA | ~1-2GB |
 | Negative features | ACAV100M (HuggingFace) | ~500MB |
 | Validation set | Pre-computed | ~50MB |

 ## Training Time Estimates (Google Colab T4)

 | Step | 1000 samples | 5000 samples |
 |------|--------------|--------------|
 | Generate clips | ~10 min | ~45 min |
 | Augment clips | ~5 min | ~20 min |
 | Train model | ~20 min | ~2 hrs |

 ## Resources

 - [OpenWakeWord GitHub](https://github.com/dscripka/openWakeWord)
 - [Training Notebook (Colab)](https://colab.research.google.com/github/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb)
 - [Pre-trained models](https://github.com/dscripka/openWakeWord/releases)
 - [HuggingFace Demo](https://huggingface.co/spaces/davidscripka/openWakeWord)
	# OpenWakeWord Integration Plan for voice-ui

	## Architecture

	```
	┌─────────────────────────────────────────────────────────────┐
	│ Always Running │
	│ ┌─────────────┐ │
	│ │ Microphone │──▶ Audio Stream (16khz, 16-bit PCM) │
	│ └─────────────┘ │ │
	│ ▼ │
	│ ┌──────────────────┐ │
	│ │ OpenWakeWord │ (~5% CPU on Pi) │
	│ │ (80ms frames) │ │
	│ └────────┬─────────┘ │
	│ │ score > threshold │
	│ ▼ │
	│ ┌──────────────────┐ │
	│ │ Wake Detected! │──▶ Play chime? │
	│ └────────┬─────────┘ │
	│ │ │
	└─────────────────────────────┼──────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────┐
	│ On Demand │
	│ ┌──────────────────┐ │
	│ │ Record Command │ (until silence/ │
	│ │ (VAD or timeout)│ 2-3 sec max) │
	│ └────────┬─────────┘ │
	│ ▼ │
	│ ┌──────────────────┐ │
	│ │ Whisper │ │
	│ └────────┬─────────┘ │
	│ ▼ │
	│ ┌──────────────────┐ │
	│ │ Agent/OpenClaw │ │
	│ └──────────────────┘ │
	└─────────────────────────────────────────────────────────────┘
	```

	## Implementation Steps

	### 1. Train custom wake word model

	```bash
	# On a Linux box with GPU (or Colab)
	# Pick something distinctive: "hey nimbus", "ok computer", etc.
	python train.py --training_config my_wakeword.yaml --generate_clips
	python train.py --training_config my_wakeword.yaml --augment_clips
	python train.py --training_config my_wakeword.yaml --train_model
	# Output: my_wakeword.onnx / my_wakeword.tflite
	```

	### 2. Add wake word listener to voice-ui

	```python
	import openwakeword
	from openwakeword.model import Model

	class WakeWordListener:
	def __init__(self, model_path, threshold=0.5):
	self.model = Model(wakeword_models=[model_path])
	self.threshold = threshold
	self.audio_buffer = []

	def process_frame(self, audio_frame):
	"""Called every 80ms with 16khz PCM audio"""
	prediction = self.model.predict(audio_frame)

	# Check if any wake word triggered
	for name, score in prediction.items():
	if score > self.threshold:
	return True # Wake detected
	return False
	```

	### 3. State machine

	```
	IDLE ──(wake detected)──▶ LISTENING ──(silence/timeout)──▶ PROCESSING ──▶ IDLE
	▲ │
	└──────────────────────────────────────────────────────────────┘
	```

	### 4. Audio pipeline changes

	- Current: Start recording on trigger
	- New: Always capture audio in ring buffer, feed to OpenWakeWord
	- On wake: Keep recording, detect end-of-speech (VAD or silence), then transcribe

	## Wake Word Suggestions

	For "Nimbus" themed:
	- `"hey nimbus"` (3 syllables, distinctive)
	- `"ok nimbus"`
	- `"nimbus"` (single word, might get more false positives)

	General tips:
	- 2-3 syllables minimum
	- Avoid common words
	- Hard consonants help (k, t, p)

	## Training Data Requirements

	\| Data Type \| Source \| Size \|
	\|-----------\|--------\|------\|
	\| Room impulse responses \| MIT dataset \| ~100MB \|
	\| Background noise \| Audioset, FMA \| ~1-2GB \|
	\| Negative features \| ACAV100M (HuggingFace) \| ~500MB \|
	\| Validation set \| Pre-computed \| ~50MB \|

	## Training Time Estimates (Google Colab T4)

	\| Step \| 1000 samples \| 5000 samples \|
	\|------\|--------------\|--------------\|
	\| Generate clips \| ~10 min \| ~45 min \|
	\| Augment clips \| ~5 min \| ~20 min \|
	\| Train model \| ~20 min \| ~2 hrs \|

	## Resources

	- [OpenWakeWord GitHub](https://github.com/dscripka/openWakeWord)
	- [Training Notebook (Colab)](https://colab.research.google.com/github/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb)
	- [Pre-trained models](https://github.com/dscripka/openWakeWord/releases)
	- [HuggingFace Demo](https://huggingface.co/spaces/davidscripka/openWakeWord)
No results found