This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Implementation Documentation for Agentic LLM Workflow: macOS ScreenMate (SwiftUI First - Direct VLM, In-Memory Screenshot, Custom Prompts) | |
## 1. Overall Project Goal: | |
Develop a native macOS application ("ScreenMate") that: | |
* Runs as a menubar accessory application (no Dock icon). | |
* Provides advanced image understanding functionality triggered by a screenshot, capturing the image **into memory (as an `NSImage`)** and processing it using a **locally loaded Vision Language Model (VLM) via MLX Swift**, with an option for users to provide **custom prompts**. (OCR is one of its capabilities). | |
* Features a main interface in a menubar popover panel. | |
* Features a "Custom Prompt" floating panel allowing users to input their own VLM prompts for image processing. | |
* Allows configuration for auto-starting at login and **selecting a VLM model from a predefined list**. | |
* Uses SwiftUI for UI components where feasible, and AppKit for system integrations and panel management. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import mlx.core as mx | |
import numpy as np | |
from transformers import PreTrainedTokenizer, AutoModel, AutoTokenizer | |
import torch | |
import torch.nn.functional as F | |
from torch import Tensor | |
from typing import List, Dict, Any, Tuple | |
from mlx_lm.utils import load | |
def tokenize_texts( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import mlx.core as mx | |
import numpy as np | |
from transformers import PreTrainedTokenizer, AutoModel, AutoTokenizer | |
import torch | |
import torch.nn.functional as F | |
from torch import Tensor | |
from typing import List, Dict, Any | |
from mlx_lm.utils import load | |
def tokenize_texts( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"schemaType": "ModelInvocationLog", | |
"schemaVersion": "1.0", | |
"timestamp": "2024-05-30T06:22:26Z", | |
"accountId": "<account-id>", | |
"identity": { | |
"arn": "<identity-arn>" | |
}, | |
"region": "us-west-2", | |
"requestId": "de4843a4-8b97-46a9-b005-878dfdf0a123", |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
install `pip install mlx-sharding` | |
For shard node: | |
Run `mlx-sharding-server --model mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx --start-layer 14 --end-layer 27` | |
For primary node: | |
Run `mlx-sharding-api --model mlx-community/DeepSeek-Coder-V2-Lite-Instruct-4bit-mlx --start-layer 0 --end-layer 14 --llm-shard-addresses <your shard node address>` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def extract_arguments(json_str): | |
json_str = json_str.replace("'", '"') | |
start_index = json_str.find('"arguments":') + len('"arguments":') | |
start_of_json = json_str.find("{", start_index) | |
end_of_json = json_str.rfind("}") | |
if start_of_json != -1 and end_of_json != -1: | |
extracted = json_str[start_of_json:end_of_json] | |
if (extracted.startswith("'") and extracted.endswith("'")) or ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments,BitsAndBytesConfig | |
from datasets import load_dataset | |
model_name ="meta-llama/Meta-Llama-3-8B-Instruct" | |
tokenizer = AutoTokenizer.from_pretrained(model_name) | |
dataset = load_dataset("glaiveai/glaive-function-calling-v2",split="train") | |
def formatting_prompts_func(example): | |
output_texts = [] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
```sh | |
Initial Setup: | |
+-------------------+ +---------------+ | |
| Text Sequence | | Raw Images | | |
| [T1, <IMG>, T2, | | [Image1, | | |
| T3, <IMG>, T4] | | Image2] | | |
+-------------------+ +---------------+ | |
Step 1: Convert Text and <IMG> Tokens to Embeddings | |
+---------------------------------------------------------+ |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
function escapeShellArg(str) { | |
return "'" + str.replace(/'/g, "'\\''") + "'"; | |
} | |
const removeBackticks = (str) => { | |
// remove leading backticks | |
str = str.replace(/^(```\n|```)/g, ""); | |
// remove tailing backticks and everything after | |
const index = str.lastIndexOf("```"); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ | |
{ | |
"name": "TheBloke/dolphin-2.1-mistral-7B-GPTQ", | |
"displayName": "TheBloke/dolphin-2.1-mistral-7B-GPTQ", | |
"description": "Mistral 7B is a new Apache 2.0 model, released by Mistral AI that outperforms Llama2 13B in benchmarks.", | |
"websiteUrl": "https://huggingface.co/ehartford/dolphin-2.1-mistral-7b", | |
"userMessageToken": "<|im_start|>user\n", | |
"userMessageEndToken": "<|im_end|>\n", | |
"assistantMessageToken": "<|im_start|>assistant\n", | |
"assistantMessageEndToken": "<|im_end|>\n", |
NewerOlder