Implementation Documentation for Agentic LLM Workflow: macOS ScreenMate (SwiftUI First - Direct VLM, In-Memory Screenshot, Custom Prompts)
Develop a native macOS application ("ScreenMate") that:
- Runs as a menubar accessory application (no Dock icon).
- Provides advanced image understanding functionality triggered by a screenshot, capturing the image into memory (as an
NSImage
) and processing it using a locally loaded Vision Language Model (VLM) via MLX Swift, with an option for users to provide custom prompts. (OCR is one of its capabilities). - Features a main interface in a menubar popover panel.
- Features a "Custom Prompt" floating panel allowing users to input their own VLM prompts for image processing.
- Allows configuration for auto-starting at login and selecting a VLM model from a predefined list.
- Uses SwiftUI for UI components where feasible, and AppKit for system integrations and panel management.