badlogic/README.md

Created June 14, 2025 20:41

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/badlogic/e12dc9298a1345727dabed76aff93543.js"></script>
Save badlogic/e12dc9298a1345727dabed76aff93543 to your computer and use it in GitHub Desktop.

Download ZIP

Yakety Documentation - LLM-optimized docs with concrete file references

Raw

README.md

Yakety

Real-time speech-to-text application with hotkey recording and local Whisper transcription. Records audio while holding a keyboard shortcut, transcribes using on-device AI, and pastes text directly into the active application.

Key Entry Points

src/main.c - Application entry point and transcription pipeline
CMakeLists.txt - Build system with whisper.cpp integration
src/audio.c - Audio recording and processing core
src/transcription.cpp - Whisper model integration

Quick Start

# Build release version
cmake --preset release
cmake --build --preset release

# Run CLI version
./build/bin/yakety-cli

# Run GUI version (macOS/Windows)
./build/bin/Yakety.app  # macOS
./build/bin/Yakety.exe  # Windows

Documentation

Project Overview - Core purpose, technology stack, platform requirements
Architecture - System organization, component map, data flow patterns
Build System - CMake configuration, build presets, platform setup
Development - Code style, patterns, implementation guidelines
Testing - GUI test suite, dialog validation, test execution
Deployment - Packaging, distribution, remote deployment

Platform Support

macOS 14.0+ (Apple Silicon) - Cocoa/SwiftUI interface
Windows 10+ - Win32 native interface
Linux - Experimental CLI support

Technology Stack

Audio: miniaudio (cross-platform capture)
Speech Recognition: whisper.cpp (local AI inference)
Build System: CMake 3.20+ with Ninja/Visual Studio
GUI: Platform-native (Cocoa/SwiftUI on macOS, Win32 on Windows)

Raw

architecture.md

Yakety Architecture

Overview

Yakety is a real-time voice transcription application built with a cross-platform C/C++ core and platform-specific UI layers. The system follows a layered architecture with clear separation between business logic, platform abstraction, and native implementations. The core design prioritizes low-latency audio processing, efficient memory management, and responsive user interaction through a unified hotkey system.

The application operates in two modes: console CLI for development/testing and GUI tray application for production use. Both modes share the same core transcription pipeline but differ in their initialization and user interaction patterns. The system integrates OpenAI's Whisper.cpp for speech recognition, providing local processing without cloud dependencies.

Component Map

Core Business Logic (`src/`)

Main Application: main.c (lines 329-390) - Entry point and initialization flow
Audio Processing: audio.c, audio.h - Real-time audio capture and buffering
Transcription Engine: transcription.cpp, transcription.h - Whisper.cpp integration
Model Management: models.c, models.h - Model loading and fallback logic
Input Handling: keylogger.h - Cross-platform hotkey detection
Menu System: menu.c, menu.h - Tray/menubar interface

Platform Abstraction Layer (`src/`)

Application Framework: app.h - Cross-platform app lifecycle management
Preferences: preferences.c, preferences.h - Configuration persistence
Utilities: utils.h - Platform-agnostic helper functions
Dialog System: dialog.h - Native dialog abstractions

macOS Implementation (`src/mac/`)

App Backend: app.m - NSApplication integration and event loop
UI Dialogs: dialogs/*.swift - SwiftUI-based native dialogs
System Integration: menu.m, clipboard.m, overlay.m - Cocoa services
Input Capture: keylogger.c - Carbon event monitoring
Threading: dispatch.m, dispatch.h - GCD-based async execution

Windows Implementation (`src/windows/`)

App Backend: app.c - Win32 application and message loop
UI Components: dialog.c, overlay.c - Win32 GUI elements
System Services: menu.c, clipboard.c - Windows shell integration
Input Capture: keylogger.c - Low-level keyboard hooks

Build System

CMake Configuration: CMakeLists.txt (lines 1-535) - Cross-platform build
Whisper Integration: cmake/BuildWhisper.cmake - Whisper.cpp compilation
Platform Setup: cmake/PlatformSetup.cmake - Platform-specific configuration

Key Files

Core Headers and Data Structures

src/app.h - Application lifecycle management

APP_ENTRY_POINT macro (lines 7-43): Platform-specific main() generation
app_main() function (line 46): Unified entry point for CLI and GUI modes
AppReadyCallback typedef (line 48): Deferred initialization pattern

src/keylogger.h - Input event handling

KeyCombination struct (lines 17-20): Multi-key hotkey support
KeyCallback typedef (line 8): Event handler function signature
KeyInfo struct (lines 11-14): Platform-agnostic key representation

src/transcription.h - Speech processing interface

transcription_process() (line 15): Main audio-to-text pipeline
transcription_init() (line 8): Whisper model initialization
Thread-safe C/C++ boundary with extern "C" wrapper

src/models.h - Model management

models_load() (line 7): Unified model loading with fallback logic
models_get_current_path() (line 10): Active model path resolution

src/model_definitions.h - Model catalog and metadata

ModelInfo struct (lines 6-12): Model metadata for UI and downloads
DOWNLOADABLE_MODELS[] array (lines 15-30): Available models with URLs
SUPPORTED_LANGUAGES[] array (lines 49-65): Language configuration

Implementation Files

src/main.c - Application bootstrap and flow control

on_app_ready() (lines 254-283): Deferred initialization sequence
setup_keylogger() (lines 120-148): Permission handling and hotkey setup
process_recorded_audio() (lines 169-215): Complete transcription pipeline
AppState struct (lines 28-31): Recording state management

src/transcription.cpp - Whisper.cpp integration

whisper_context *ctx (line 17): Global Whisper model instance
utils_mutex_t *ctx_mutex (line 18): Thread safety for model access
null_log_callback() (lines 29-34): Whisper log suppression

src/preferences.c - Configuration persistence

Cross-platform config file handling with JSON-like key-value storage
KeyCombination serialization for hotkey preferences
Platform-specific config directory resolution

Data Flow

Initialization Sequence

Entry Point: main() → app_main() (main.c:329)
Core Setup: Logging, preferences, signal handlers (main.c:340-360)
Platform Init: app_init() calls platform-specific initialization
Deferred Loading: on_app_ready() callback triggered after event loop starts
Model Loading: models_load() → transcription_init() (models.c:24-42)
UI Setup: Menu creation, keylogger initialization with permissions
Ready State: Application monitoring for hotkey events

Transcription Pipeline

Input Trigger: Hotkey press detected by platform keylogger (keylogger.c)
Recording Start: on_key_press() → audio_recorder_start() (main.c:217-231)
Audio Capture: Platform-specific audio recording via miniaudio
Recording Stop: on_key_release() → process_recorded_audio() (main.c:233-250)
Audio Processing: audio_recorder_get_samples() retrieves float buffer
Speech Recognition: transcription_process() → Whisper inference (transcription.cpp:15)
Text Output: clipboard_copy() → clipboard_paste() for immediate insertion
UI Feedback: Overlay shows "Recording" and "Transcribing" states

Cross-Platform Abstraction

Main Thread Dispatch: macOS uses dispatch_async(dispatch_get_main_queue()), Windows uses PostMessage()
Event Loop Integration: macOS NSRunLoop, Windows GetMessage()/DispatchMessage()
Permission Handling: macOS Accessibility API, Windows UAC/Admin privileges
Resource Management: macOS app bundles with Info.plist, Windows resource files (.rc)

Threading Model

Main Thread: UI, event handling, clipboard operations
Audio Thread: Real-time audio capture (managed by miniaudio)
Background Thread: Whisper inference (CPU/GPU intensive)
Synchronization: Mutex protection for Whisper context, atomic operations for state flags

The architecture emphasizes minimal latency for the complete transcription cycle while maintaining thread safety and platform compatibility across macOS and Windows environments.

Raw

build-system.md

Build System Documentation

Overview

Yakety uses CMake 3.20+ as its primary build system with multi-language support (C, C++, Swift). The build system is configured through:

Main CMake file: CMakeLists.txt - Primary build configuration
Build presets: CMakePresets.json - Platform-specific build configurations
Helper modules: cmake/ directory with modular build logic
- cmake/BuildWhisper.cmake - Whisper.cpp dependency management
- cmake/PlatformSetup.cmake - Platform-specific libraries and frameworks
- cmake/GenerateIcons.cmake - Asset generation from SVG sources

The system automatically handles whisper.cpp dependency building, model downloading, icon generation, and platform-specific configurations.

Build Workflows

Quick Start Commands

Development builds:

# Release build (recommended for development)
cmake --preset release
cmake --build --preset release

# Debug build
cmake --preset debug
cmake --build --preset debug

Windows debugging with Visual Studio:

# Only on Windows - enables Visual Studio debugging
cmake --preset vs-debug
cmake --build --preset vs-debug

Distribution packaging:

# Build and package for current platform
cmake --build --preset release
cmake --build build --target package

# Platform-specific packages
cmake --build build --target package-macos    # macOS only
cmake --build build --target package-windows  # Windows only

# Upload to server (requires SSH access)
cmake --build build --target upload

Build Targets

The build system generates these executables in build/bin/:

yakety-cli - Command-line interface
yakety-app - GUI application (platform-specific bundle)
recorder - Audio recording utility
transcribe - Standalone transcription tool
test-* - Platform-specific test executables (macOS only)

Asset Generation

Icons are automatically generated from assets/yakety.svg:

# Manual icon regeneration (automatic during build)
cmake --build build --target generate_icons

Requires: rsvg-convert (librsvg) and magick (ImageMagick)

Platform Setup

macOS Requirements

Xcode Command Line Tools: Required for Swift compiler
macOS 14.0+: Minimum deployment target
Apple Silicon (ARM64): Target architecture
System frameworks: Automatically linked
- CoreFoundation, AppKit, AudioToolbox, AVFoundation
- Metal frameworks for GPU acceleration

Dependencies:

# Install build tools via Homebrew
brew install librsvg imagemagick ninja cmake

Windows Requirements

Visual Studio 2022: For MSVC compiler and debugging
CMake 3.20+: Build system
Ninja: Fast builds (included in VS2022)
Vulkan SDK: Optional GPU acceleration

Environment setup:

Set VULKAN_SDK environment variable for GPU support
Use winvs.bat script for proper Visual Studio environment

Windows/WSL Remote Development

For development from macOS to Windows via SSH:

Setup scripts:

wsl/start-wsl-ssh.bat - Run as Administrator on Windows
wsl/setup-wsl-ssh.sh - Configure SSH in WSL

Sync and build workflow:

# 1. Sync source files (excludes build directories)
rsync -av --exclude='build/' --exclude='build-debug/' --exclude='whisper.cpp/' \
  . [email protected]:/mnt/c/workspaces/yakety/

# 2. Configure build
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
  c:\\workspaces\\winvs.bat && cmake --preset release'"

# 3. Build
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  /mnt/c/Windows/System32/cmd.exe /c 'cd c:\\workspaces\\yakety && \
  c:\\workspaces\\winvs.bat && cmake --build --preset release'"

# 4. Run CLI
ssh [email protected] "cd /mnt/c/workspaces/yakety && \
  build/bin/yakety-cli.exe"

Linux Requirements (Experimental)

GCC/Clang: C/C++ compiler
ALSA/PulseAudio: Audio system libraries
CMake 3.20+, Ninja: Build tools

Reference

CMake Presets

From CMakePresets.json:

Configure presets:

release - Ninja generator, Release build, build/ directory
debug - Ninja generator, Debug build, build-debug/ directory
vs-debug - Visual Studio 2022, Windows-only debugging

Build presets:

release - Build release configuration
debug - Build debug configuration
vs-debug - Build Windows VS debug configuration

Whisper.cpp Integration

Automatic dependency management via cmake/BuildWhisper.cmake:

Auto-clone: Downloads whisper.cpp from GitHub if missing
Platform optimization:
- macOS: Metal GPU acceleration, ARM64 architecture
- Windows: Native CPU optimization, optional Vulkan GPU
Model download: Automatically downloads ggml-base-q8_0.bin (110MB)
Static linking: All whisper libraries statically linked

Code Signing (macOS)

Automatic ad-hoc signing via cmake/PlatformSetup.cmake:

# Manual signing
./sign-app.sh  # Signs and removes quarantine

Troubleshooting

Whisper.cpp build failures:

Verify internet connection for auto-download
Check disk space (whisper.cpp ~500MB + model ~110MB)
On Windows: Ensure Visual Studio environment is loaded

Swift compilation warnings:

Incremental compilation disabled via CMAKE_Swift_FLAGS
Normal for mixed C/Swift projects

Icon generation failures:

Install librsvg: brew install librsvg (macOS) or apt install librsvg2-bin (Linux)
Install ImageMagick: brew install imagemagick (macOS)

Windows Vulkan not detected:

Install Vulkan SDK and set VULKAN_SDK environment variable
Restart command prompt after installation

Linking errors:

Clean build directories: rm -rf build build-debug
Rebuild whisper.cpp: rm -rf whisper.cpp/build
On Windows: Match Debug/Release configuration with whisper.cpp

Raw

deployment.md

Deployment Documentation

Overview

Yakety provides multiple packaging and distribution options for cross-platform deployment. The build system includes automated packaging targets for creating distributable archives, DMG installers, and deployment to remote servers.

Package Types

CLI Distribution Packages

Target: package-cli-macos | package-cli-windows
Output: yakety-cli-{platform}.zip
Location: ${CMAKE_BINARY_DIR}/
Contents: CLI tools (yakety-cli, recorder, transcribe), models, assets

App Distribution Packages

Target: package-app-macos | package-app-windows
Output: macOS: Yakety-macos.dmg + Yakety-macos.zip | Windows: Yakety-windows.zip
Location: ${CMAKE_BINARY_DIR}/
Contents: Application bundles with embedded resources

Universal Package Target

Target: package
Behavior: Platform-conditional (package-macos on Darwin, package-windows on Windows)

Platform Deployment

macOS

# Build release binaries
cmake --preset release
cmake --build --preset release

# Create CLI distribution
cmake --build build --target package-cli-macos
# Output: build/yakety-cli-macos.zip

# Create app distribution with DMG
cmake --build build --target package-app-macos
# Output: build/Yakety-macos.dmg, build/Yakety-macos.zip

# Create all packages
cmake --build build --target package-macos

DMG Creation Process:

Copies app bundle to temp directory
Creates Applications symlink for drag-install
Generates DMG with hdiutil create
Creates compressed ZIP of DMG

Code Signing: Automatic ad-hoc signing with codesign --force --deep --sign -

Windows

# Build release binaries
cmake --preset release
cmake --build --preset release

# Create CLI distribution
cmake --build build --target package-cli-windows
# Output: build/yakety-cli-windows.zip

# Create app distribution
cmake --build build --target package-app-windows
# Output: build/Yakety-windows.zip

# Create all packages
cmake --build build --target package-windows

Windows-specific:

CLI executable: yakety-cli.exe
GUI executable: Yakety.exe (WIN32 app without console)
Vulkan acceleration support (if VULKAN_SDK available)

Remote Deployment

# Upload packages to server
cmake --build build --target upload

Upload Destinations:

Target Server: [email protected]
Path: /home/badlogic/mariozechner.at/html/uploads/
Method: Windows: SCP via batch script | Unix: rsync

Website Deployment

Frontend-only Deploy:

cd website
./publish.sh

Full Deploy with Server Restart:

cd website
./publish.sh server

Docker Control:

cd website/docker
./control.sh start      # Production mode
./control.sh startdev   # Development mode
./control.sh stop       # Stop services
./control.sh logs       # View logs
./control.sh restart    # Restart services

Reference

Build Outputs

Binary Directory: ${CMAKE_BINARY_DIR}/bin/
CLI Tools: yakety-cli, recorder, transcribe
GUI Apps: Yakety.app (macOS bundle) | Yakety.exe (Windows)
Models: bin/models/ggml-base-q8_0.bin
Assets: bin/menubar.png

Distribution Archives

macOS CLI: yakety-cli-macos.zip
macOS App: Yakety-macos.dmg, Yakety-macos.zip
Windows CLI: yakety-cli-windows.zip
Windows App: Yakety-windows.zip

Website Configuration

Production Domain: yakety.ai, www.yakety.ai
SSL: Let's Encrypt via nginx-proxy
Server Stack: Docker (Nginx + Node.js)
Deployment: rsync to slayer.marioslab.io

Build Presets

Release: cmake --preset release (Ninja, optimized)
Debug: cmake --preset debug (Ninja, debugging symbols)
VS Debug: cmake --preset vs-debug (Visual Studio, Windows only)

WSL/Remote Development

Target: Windows machine at 192.168.1.21
Sync Command: rsync -av --exclude='build/' --exclude='whisper.cpp/' . [email protected]:/mnt/c/workspaces/yakety/
Build via SSH: Uses cmd.exe with winvs.bat environment setup

Raw

development.md

Development Guide

Overview

Yakety is a voice transcription tool using a C-style C++ architecture with platform abstraction. The codebase follows C conventions with minimal C++ usage (only for whisper.cpp integration). Core features:

Cross-platform: macOS (Objective-C/Swift) and Windows (Win32 API)
Singleton patterns: Audio recorder, preferences, models
Platform abstraction: Clean separation between core logic and platform code
Minimal dependencies: Uses system APIs directly

Code Style

C-Style C++ Conventions

File Extensions:

.c - Pure C code
.cpp - C++ code (only when whisper.cpp features needed)
.m - Objective-C (macOS platform layer)
.swift - SwiftUI dialogs (macOS only)

Naming Conventions:

// Functions: module_action_object
bool audio_recorder_init(void);           // src/audio.c:82
int keylogger_set_combination(combo);     // Keylogger API
void preferences_set_string(key, value);  // src/preferences.h:24

// Types: CamelCase with descriptive names
typedef struct {
    ma_device device;
    float *buffer;
    bool is_recording;        // Atomic access required
} AudioRecorder;              // src/audio.c:14-32

// Constants: UPPER_CASE
#define WHISPER_SAMPLE_RATE 16000        // src/audio.c:10
#define MIN_RECORDING_DURATION 0.1       // src/main.c:26

C-Style Casting:

AudioRecorder *recorder = (AudioRecorder *) pDevice->pUserData;  // src/audio.c:39
const float *input = (const float *) pInput;                     // src/audio.c:45

Header Guards:

#ifndef AUDIO_H
#define AUDIO_H
// ... content ...
#endif // AUDIO_H

Common Patterns

Singleton Pattern

Audio Recorder (src/audio.h, src/audio.c):

// Global singleton instance
static AudioRecorder *g_recorder = NULL;

bool audio_recorder_init(void) {
    if (g_recorder) {
        return false; // Already initialized
    }
    g_recorder = (AudioRecorder *) calloc(1, sizeof(AudioRecorder));
    // ... initialization ...
}

void audio_recorder_cleanup(void) {
    if (!g_recorder) return;
    // ... cleanup ...
    free(g_recorder);
    g_recorder = NULL;
}

Platform Abstraction

Directory Structure:

src/
├── audio.h/c          # Cross-platform core logic
├── utils.h            # Platform abstraction interface
├── mac/               # macOS implementations
│   ├── app.m          # NSApplication handling
│   ├── utils.m        # Platform-specific utilities
│   └── dialogs/       # SwiftUI dialog implementations
└── windows/           # Windows implementations
    ├── app.c          # Win32 application handling
    └── utils.c        # Platform-specific utilities

Interface Pattern (src/utils.h):

// Cross-platform interface
void utils_open_accessibility_settings(void);
bool utils_set_launch_at_login(bool enabled);
double utils_get_time(void);

// Platform implementations differ:
// - src/mac/utils.m: Uses NSWorkspace, CFAbsoluteTimeGetCurrent
// - src/windows/utils.c: Uses ShellExecute, GetTickCount64

App Initialization Pattern (src/app.h):

typedef void (*AppReadyCallback)(void);
int app_init(const char *name, const char *version, bool is_console, AppReadyCallback on_ready);

// Platform-specific implementations:
// - src/mac/app.m: Uses NSApplication, NSApplicationDelegate
// - src/windows/app.c: Uses CreateWindow, message pump

Thread Safety

Atomic Operations (src/audio.c:41, src/utils.h:43-46):

// Thread-safe boolean access
bool utils_atomic_read_bool(bool *ptr);
void utils_atomic_write_bool(bool *ptr, bool value);

// Usage in audio callback (audio thread → main thread)
if (!utils_atomic_read_bool(&recorder->is_recording)) {
    return;
}

Error Handling

Return Code Pattern:

// Success: 0, Failure: -1 or non-zero
int audio_recorder_start(void);          // src/audio.h:17
int keylogger_init(callbacks, userdata); // Returns 0 on success

// Boolean for simple operations
bool audio_recorder_init(void);          // src/audio.h:10
bool preferences_init(void);             // src/preferences.h:9

Error Logging:

if (ma_device_start(&recorder->device) != MA_SUCCESS) {
    utils_atomic_write_bool(&recorder->is_recording, false);
    return -1;
}

SwiftUI Dialog Pattern

Modal Dialog Implementation (src/mac/dialogs/dialog_utils.swift:18-23):

func runModalDialog<T: View, StateType: ModalDialogState>(
    content: T,
    state: StateType,
    windowSize: NSSize = NSSize(width: 400, height: 200),
    windowTitle: String = ""
) -> StateType.ResultType

Dialog State Protocol:

protocol ModalDialogState: ObservableObject {
    associatedtype ResultType
    var isCompleted: Bool { get set }
    var result: ResultType { get }
    func reset()
}

Workflows

Adding a New Feature

Core Logic - Implement in src/ using C-style conventions
Platform Interface - Add declarations to appropriate header (e.g., src/utils.h)
Platform Implementation - Implement in src/mac/ and src/windows/
Integration - Wire up in src/main.c app lifecycle

Platform-Specific Dialog

macOS: Create SwiftUI view in src/mac/dialogs/
Windows: Implement Win32 dialog in src/windows/dialog.c
Interface: Add C function declaration in src/dialog.h

Audio Processing

Audio pipeline follows whisper.cpp requirements:

#define WHISPER_SAMPLE_RATE 16000  // Fixed 16kHz
#define WHISPER_CHANNELS 1         // Mono only

Recording flow (src/main.c:168-215):

Key Press → audio_recorder_start() → data_callback() fills buffer
Key Release → audio_recorder_stop() → get_samples() → transcription_process()

Reference

File Organization

Core Modules:

src/main.c - App entry point and lifecycle (329-391)
src/audio.c/h - Audio recording singleton
src/preferences.c/h - Configuration management
src/models.c/h - Whisper model loading
src/transcription.cpp/h - Whisper.cpp integration (C++)

Platform Abstraction:

src/utils.h - Cross-platform interface definitions
src/app.h - Application framework interface
src/mac/ - macOS implementations (Objective-C/Swift)
src/windows/ - Windows implementations (Win32 C)

Build System:

CMakeLists.txt - Main build configuration
cmake/PlatformSetup.cmake - Platform-specific setup
cmake/BuildWhisper.cmake - Whisper.cpp integration

Key Constants

#define WHISPER_SAMPLE_RATE 16000           // Audio format for transcription
#define WHISPER_CHANNELS 1                  // Mono audio
#define MIN_RECORDING_DURATION 0.1          // Minimum recording length
#define PERMISSION_RETRY_DELAY_MS 500       // macOS permission retry delay

Build Presets

Development:

cmake --preset debug    # Debug build with Ninja
cmake --preset release  # Release build with Ninja

Windows Debugging:

cmake --preset vs-debug # Visual Studio generator for debugging

Common Issues

macOS Accessibility Permissions:

Handle in src/main.c:78-117 with dialog prompts
Retry mechanism for permission granting

Thread Safety:

Audio callback runs on separate thread
Use utils_atomic_* for shared state access
Main app state in src/main.c:28-33

Model Loading:

Single unified function: models_load() in src/models.c
Handles download dialogs and fallback logic
Path management through preferences system

Memory Management:

Consistent use of malloc/free for C compatibility
Audio buffer auto-resizing in src/audio.c:54-66
Caller owns returned buffers (e.g., audio_recorder_get_samples)

Raw

project-overview.md

Project Overview

Overview

Yakety is a real-time speech-to-text application that provides instant transcription through keyboard shortcuts. It records audio while a hotkey is held down, transcribes the speech using OpenAI's Whisper model, and automatically pastes the transcribed text into the active application. The application is designed for efficient voice-to-text input across desktop workflows.

The project targets both CLI and GUI usage patterns, supporting macOS and Windows with platform-specific implementations. It integrates whisper.cpp for on-device transcription, eliminating the need for cloud services while maintaining privacy. The application features a system tray interface for GUI mode and comprehensive keyboard monitoring for seamless user interaction.

Key Files

src/main.c: Primary application entry point containing initialization sequence, audio processing pipeline, and keyboard event handling (lines 254-388)
src/app.h: Cross-platform application framework with platform-specific entry point macros (lines 6-43) and async execution utilities
CMakeLists.txt: Build system configuration managing whisper.cpp integration (lines 28-32), platform-specific compilation (lines 48-85), and distribution packaging (lines 358-535)
src/transcription.cpp: Whisper model integration and audio processing core (lines 49-100)

Technology Stack

Audio Processing: miniaudio library for cross-platform audio capture in src/audio.c with 16kHz mono configuration (lines 9-11)
Speech Recognition: whisper.cpp integration for local transcription processing in src/transcription.cpp (lines 14-15)
Platform Abstraction: C-style C++ implementation with platform-specific modules in src/mac/ and src/windows/
Build System: CMake with custom modules in cmake/ directory, supporting Ninja and Visual Studio generators
GUI Framework:
- macOS: Objective-C/Swift UI in src/mac/dialogs/ with SwiftUI dialogs
- Windows: Win32 API in src/windows/ with native dialog implementations

Platform Support

macOS Requirements:

Minimum macOS 14.0 (Apple Silicon only, set in CMakeLists.txt line 22)
Accessibility permissions for keyboard monitoring (handled in src/main.c lines 78-117)
Metal acceleration support via ggml-metal library integration
System tray menubar interface in src/mac/menu.m

Windows Requirements:

Windows 10+ with Visual Studio 2022 build tools
Optional Vulkan support for GPU acceleration
WSL development environment supported via scripts in wsl/ directory
System tray interface in src/windows/menu.c

Cross-Platform Components:

Keyboard monitoring: src/mac/keylogger.c and src/windows/keylogger.c
Audio recording: src/audio.c with platform-specific audio device handling
Preferences storage: src/preferences.c with platform-specific configuration paths
Model management: src/models.c with bundled and downloadable Whisper models defined in src/model_definitions.h

Raw

testing.md

Testing Documentation

Overview

Yakety uses manual interactive tests for GUI components and dialog validation. All test programs are located in /src/tests/ and test specific platform dialog implementations using the app's GUI framework integration.

Test File Locations: /src/tests/test_*.c

Test Types

Dialog Integration Tests

Manual GUI tests that validate platform-specific dialog implementations:

/src/tests/test_model_dialog.c - Models & Language selection dialog
/src/tests/test_keycombination_dialog.c - Hotkey capture dialog with keylogger integration
/src/tests/test_download_dialog.c - Model download progress dialog

Test Structure

All tests follow this pattern:

Initialize platform app framework (app_init)
Set up test-specific dependencies (keylogger, callbacks)
Execute dialog function with test parameters
Validate results and clean up resources
Exit with status code

Running Tests

Prerequisites

# Configure and build project first
cmake --preset release  # or debug
cmake --build --preset release

Test Execution Commands

Model Dialog Test:

./build/bin/test-model-dialog

Expected: GUI dialog opens for model/language selection, prints selected values or cancellation.

Key Combination Dialog Test:

./build/bin/test-keycombination-dialog

Expected: GUI dialog captures key combinations, prints key codes and modifier flags.

Download Dialog Test:

./build/bin/test-download-dialog

Expected: Downloads test file (1KB from httpbin.org), shows progress dialog, cleans up temp file.

Platform Availability

Tests are only built and available on macOS (requires Cocoa/SwiftUI frameworks). Windows tests would require separate implementations using platform-specific dialog APIs.

Reference

Test File Organization

src/tests/
├── test_model_dialog.c         # Model selection dialog test
├── test_keycombination_dialog.c # Hotkey capture dialog test  
└── test_download_dialog.c      # Download progress dialog test

CMake Test Targets

Test executables are defined in /CMakeLists.txt lines 502-532:

# Test programs (macOS only)
add_executable(test-model-dialog src/tests/test_model_dialog.c)
add_executable(test-keycombination-dialog src/tests/test_keycombination_dialog.c)  
add_executable(test-download-dialog src/tests/test_download_dialog.c)

Test Dependencies

Platform library: Core app and dialog functions
Cocoa framework: macOS GUI integration (-framework Cocoa)
SwiftUI framework: Modern dialog implementations (-framework SwiftUI)
Keylogger: Required for hotkey capture testing

Test Output Location

All test executables built to: /build/bin/test-*

Creating New Tests

Add test source file in /src/tests/test_<feature>.c
Follow existing test pattern with app_init, test logic, app_cleanup
Add CMake target in main /CMakeLists.txt (macOS section)
Link required platform libraries and frameworks
Ensure proper cleanup and exit codes for automation

Raw

update-docs.md

Update Documentation

You will generate LLM-optimized documentation with concrete file references and flexible formatting.

Your Task

Create documentation that allows humans and LLMs to:

Understand project purpose - what the project does and why
Get architecture overview - how the system is organized
Build on all platforms - build instructions with file references
Add features/subsystems - following established patterns with examples
Debug applications - troubleshoot issues with specific file locations
Test and add tests - run existing tests and create new ones
Deploy and distribute - package and deploy the software

Required Documentation Structure

Each document MUST include:

Timestamp Header - Hidden comment with last update timestamp
Brief Overview (2-3 paragraphs max)
Key Files & Examples - Concrete file references for each major topic
Common Workflows - Practical guidance with file locations
Reference Information - Quick lookup tables with file paths

Timestamp Format

Each generated file MUST start with:

<!-- Generated: YYYY-MM-DD HH:MM:SS UTC -->

Process

You will:

Analyze the codebase systematically across 6 key areas (merging development+patterns)
Create or update docs in docs/*.md with concrete file references
Synthesize final documentation into a minimal, LLM-friendly README.md
Eliminate all duplication across files

Analysis Methodology

For each area, agents should:

Examine key files: Look for build configs, test files, deployment scripts, main source files
Extract file references: Note specific files, line numbers, and examples
Identify patterns: Find repeated structures, naming conventions, common workflows
Make content LLM-friendly: Token-efficient, reference-heavy, practical examples

Specific File Requirements

Issue the following Task calls in parallel:

Project Overview (docs/project-overview.md): STRUCTURE:

Overview: What the project is, core purpose, key value proposition (2-3 paragraphs)
Key Files: Main entry points (src/main.c, src/app.h, CMakeLists.txt)
Technology Stack: Core technologies with specific file examples
Platform Support: Requirements with platform-specific file locations

Architecture (docs/architecture.md): STRUCTURE:

Overview: High-level system organization (2-3 paragraphs)
Component Map: Major components with their source file locations
Key Files: Core headers and implementations with brief descriptions
Data Flow: How information flows with specific function/file references

Build System (docs/build-system.md): STRUCTURE:

Overview: CMake system with file references (CMakeLists.txt, CMakePresets.json)
Build Workflows: Common tasks with specific commands and config files
Platform Setup: Platform-specific requirements with file paths
Reference: Build targets, presets, and troubleshooting with file locations

Testing (docs/testing.md): STRUCTURE:

Overview: Testing approach with test file locations (src/tests/*)
Test Types: Different test categories with specific file examples
Running Tests: Commands with file paths and expected outputs
Reference: Test file organization, CMake test targets

Development (docs/development.md): STRUCTURE:

Overview: Development environment, code style, patterns (merge with old patterns.md)
Code Style: Conventions with specific file examples (show actual code from codebase)
Common Patterns: Implementation patterns with file references (singleton pattern in src/audio.h, platform abstraction in src/mac/, src/windows/)
Workflows: Development tasks with concrete file locations and examples
Reference: File organization, naming conventions, common issues with specific files

Deployment (docs/deployment.md): STRUCTURE:

Overview: Packaging and distribution with script references
Package Types: Different packages with CMake targets and output locations
Platform Deployment: Platform-specific packaging with file paths
Reference: Deployment scripts, output locations, server configurations

Critical Requirements

LLM-OPTIMIZED FORMAT

Token efficient: Avoid redundant explanations, focus on essential information
Concrete file references: Always include specific file paths, line numbers when helpful
Flexible formatting: Use subsections, code blocks, examples instead of rigid step-by-step
Pattern examples: Show actual code from the codebase, not generic examples

NO DUPLICATION

Each piece of information appears in EXACTLY ONE file
Build information only in build-system.md
Code style and patterns only in development.md
Deployment information only in deployment.md
Cross-references using: "See docs/filename.md"

FILE REFERENCE FORMAT

Always include specific file references:

**Audio System** - Core implementation in src/audio.h (lines 15-45), platform backends in src/mac/audio.m and src/windows/audio.c

**Build Configuration** - Main CMakeLists.txt (lines 67-89), presets in CMakePresets.json

**Model Management** - Interface in src/models.h, implementation in src/models.c (model_download function at line 134)

PRACTICAL EXAMPLES

Use actual code from the codebase:

// From src/audio.h:23-27
typedef struct {
    bool recording;
    float *buffer;
    int sample_rate;
} AudioState;

Final Steps

After all tasks complete:

Read all docs/*.md files and create README.md with:
- Project description (2-3 sentences max)
- Key entry points (src/main.c, CMakeLists.txt, etc.)
- Quick build commands
- Documentation links with brief descriptions of what LLMs will find useful
- Keep it under 50 lines total
Duplication check: Scan all files and remove any duplicated information
File reference check: Ensure all file paths are accurate and helpful

Agent Instructions

Each agent must:

Read existing file if it exists to understand current content
Analyze relevant codebase files systematically
Extract specific file references throughout analysis:
- Note important headers, source files, configuration files
- Include line numbers for key functions/sections when helpful
- Reference actual code examples from the codebase
Create LLM-friendly content:
- Token-efficient writing (no redundant explanations)
- Concrete file paths and examples throughout
- Flexible formatting (subsections, code blocks, practical guidance)
- Focus on what LLMs need to understand and work with the code
Include practical workflows with specific file references
Create reference sections with file locations and line numbers
Update timestamp at the top with current UTC time
Read generated file and revise for accuracy and completeness

Success criteria: Each file should be a practical reference that helps LLMs quickly understand the codebase and find the right files for specific tasks.

Special note for development.md: Merge content from both old development.md and patterns.md into a single comprehensive development guide with implementation patterns.

The coordinating agent must:

Wait for all agents to complete
Read all generated files
Remove any duplication found
Create a minimal, LLM-optimized README.md with key file references
Update README.md timestamp with current UTC time
Delete docs/patterns.md since it's merged into development.md

badlogic/README.md

Yakety

Key Entry Points

Quick Start

Documentation

Platform Support

Technology Stack

Yakety Architecture

Overview

Component Map

Core Business Logic (src/)

Platform Abstraction Layer (src/)

macOS Implementation (src/mac/)

Windows Implementation (src/windows/)

Build System

Key Files

Core Headers and Data Structures

Implementation Files

Data Flow

Initialization Sequence

Transcription Pipeline

Cross-Platform Abstraction

Threading Model

Build System Documentation

Overview

Build Workflows

Quick Start Commands

Build Targets

Asset Generation

Platform Setup

macOS Requirements

Windows Requirements

Windows/WSL Remote Development

Linux Requirements (Experimental)

Reference

CMake Presets

Whisper.cpp Integration

Code Signing (macOS)

Troubleshooting

Deployment Documentation

Overview

Package Types

CLI Distribution Packages

App Distribution Packages

Universal Package Target

Platform Deployment

macOS

Windows

Remote Deployment

Website Deployment

Reference

Build Outputs

Distribution Archives

Website Configuration

Build Presets

WSL/Remote Development

Development Guide

Overview

Code Style

C-Style C++ Conventions

Common Patterns

Singleton Pattern

Platform Abstraction

Thread Safety

Error Handling

SwiftUI Dialog Pattern

Workflows

Adding a New Feature

Platform-Specific Dialog

Audio Processing

Reference

File Organization

Key Constants

Build Presets

Common Issues

Project Overview

Overview

Key Files

Technology Stack

Platform Support

Core Business Logic (`src/`)

Platform Abstraction Layer (`src/`)

macOS Implementation (`src/mac/`)

Windows Implementation (`src/windows/`)