Skip to content

Instantly share code, notes, and snippets.

@MicahZoltu
Last active May 6, 2025 14:00
Show Gist options
  • Save MicahZoltu/7451c429b03d702145658a7fec5c8227 to your computer and use it in GitHub Desktop.
Save MicahZoltu/7451c429b03d702145658a7fec5c8227 to your computer and use it in GitHub Desktop.
local-ai
docker-compose --file local-ai-docker-compose.yml up

Once it finishes its initial startup, you can navigate to http://localhost:3000/. Once in, click "Select a model" dropdown, type/paste qwen3:30b-a3b into the search box, and then click the button for "download from ollama.com". This step will take quite a while, but once done it will be saved in a docker volume and the model will be available and default in the future.

Default prompts are overridden to have /no_think at the start so thinking doesn't break the output for supplamentary tasks, hopefully future releases will handle thinking models better and these can be deleted. Thinking should be enabled by default for normal prompts (not overriden).

name: 'local-ai'
services:
openwebui:
image: 'ghcr.io/open-webui/open-webui:v0.6.6-cuda@sha256:bc307f2847d28c215270f67a8e88389c1339d210e452ad1854f60b0c6bb1100f'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
ports:
- target: '8080'
published: '3000'
protocol: 'tcp'
volumes:
- type: 'volume'
source: 'open-webui'
target: '/app/backend/data'
environment:
USE_CUDA_DOCKER: 'True'
WEBUI_AUTH: 'False'
BYPASS_MODEL_ACCESS_CONTROL: 'True'
OLLAMA_BASE_URL: 'http://ollama:11434'
DEFAULT_MODELS: 'qwen3:30b-a3b'
TASK_MODEL: 'qwen3:30b-a3b'
IMAGE_GENERATION_ENGINE: 'comfyui'
ENABLE_IMAGE_GENERATION: 'True'
COMFYUI_BASE_URL: 'comfyui:8188'
TITLE_GENERATION_PROMPT_TEMPLATE: |
/no_think
### Task:
Generate a concise, 3-5 word title with an emoji summarizing the chat history.
### Guidelines:
- The title should clearly represent the main theme or subject of the conversation.
- Use emojis that enhance understanding of the topic, but avoid quotation marks or special formatting.
- Write the title in the chat's primary language; default to English if multilingual.
- Prioritize accuracy over excessive creativity; keep it clear and simple.
### Output:
JSON format: { "title": "your concise title here" }
### Examples:
- { "title": "📉 Stock Market Trends" },
- { "title": "🍪 Perfect Chocolate Chip Recipe" },
- { "title": "Evolution of Music Streaming" },
- { "title": "Remote Work Productivity Tips" },
- { "title": "Artificial Intelligence in Healthcare" },
- { "title": "🎮 Video Game Development Insights" }
### Chat History:
<chat_history>
{{MESSAGES:END:2}}
</chat_history>
TOOLS_FUNCTION_CALLING_PROMPT_TEMPLATE: |
/no_think
Available Tools: {{TOOLS}}
Your task is to choose and return the correct tool(s) from the list of available tools based on the query. Follow these guidelines:
- Return only the JSON object, without any additional text or explanation.
- If no tools match the query, return an empty array:
{
"tool_calls": []
}
- If one or more tools match the query, construct a JSON response containing a "tool_calls" array with objects that include:
- "name": The tool's name.
- "parameters": A dictionary of required parameters and their corresponding values.
The format for the JSON response is strictly:
{
"tool_calls": [
{"name": "toolName1", "parameters": {"key1": "value1"}},
{"name": "toolName2", "parameters": {"key2": "value2"}}
]
}
AUTOCOMPLETE_GENERATION_PROMPT_TEMPLATE: |
/no_think
### Task:
You are an autocompletion system. Continue the text in `<text>` based on the **completion type** in `<type>` and the given language.
### **Instructions**:
1. Analyze `<text>` for context and meaning.
2. Use `<type>` to guide your output:
- **General**: Provide a natural, concise continuation.
- **Search Query**: Complete as if generating a realistic search query.
3. Start as if you are directly continuing `<text>`. Do **not** repeat, paraphrase, or respond as a model. Simply complete the text.
4. Ensure the continuation:
- Flows naturally from `<text>`.
- Avoids repetition, overexplaining, or unrelated ideas.
5. If unsure, return: `{ "text": "" }`.
### **Output Rules**:
- Respond only in JSON format: `{ "text": "<your_completion>" }`.
### **Examples**:
#### Example 1:
Input:
<type>General</type>
<text>The sun was setting over the horizon, painting the sky</text>
Output:
{ "text": "with vibrant shades of orange and pink." }
#### Example 2:
Input:
<type>Search Query</type>
<text>Top-rated restaurants in</text>
Output:
{ "text": "New York City for Italian cuisine." }
---
### Context:
<chat_history>
{{MESSAGES:END:6}}
</chat_history>
<type>{{TYPE}}</type>
<text>{{PROMPT}}</text>
#### Output:
TAGS_GENERATION_PROMPT_TEMPLATE: |
/no_think
### Task:
Generate 1-3 broad tags categorizing the main themes of the chat history, along with 1-3 more specific subtopic tags.
### Guidelines:
- Start with high-level domains (e.g. Science, Technology, Philosophy, Arts, Politics, Business, Health, Sports, Entertainment, Education)
- Consider including relevant subfields/subdomains if they are strongly represented throughout the conversation
- If content is too short (less than 3 messages) or too diverse, use only ["General"]
- Use the chat's primary language; default to English if multilingual
- Prioritize accuracy over specificity
### Output:
JSON format: { "tags": ["tag1", "tag2", "tag3"] }
### Chat History:
<chat_history>
{{MESSAGES:END:6}}
</chat_history>
QUERY_GENERATION_PROMPT_TEMPLATE: |
/no_think
### Task:
Analyze the chat history to determine the necessity of generating search queries, in the given language. By default, **prioritize generating 1-3 broad and relevant search queries** unless it is absolutely certain that no additional information is required. The aim is to retrieve comprehensive, updated, and valuable information even with minimal uncertainty. If no search is unequivocally needed, return an empty list.
### Guidelines:
- Respond **EXCLUSIVELY** with a JSON object. Any form of extra commentary, explanation, or additional text is strictly prohibited.
- When generating search queries, respond in the format: { "queries": ["query1", "query2"] }, ensuring each query is distinct, concise, and relevant to the topic.
- If and only if it is entirely certain that no useful results can be retrieved by a search, return: { "queries": [] }.
- Err on the side of suggesting search queries if there is **any chance** they might provide useful or updated information.
- Be concise and focused on composing high-quality search queries, avoiding unnecessary elaboration, commentary, or assumptions.
- Today's date is: {{CURRENT_DATE}}.
- Always prioritize providing actionable and broad queries that maximize informational coverage.
### Output:
Strictly return in JSON format:
{
"queries": ["query1", "query2"]
}
### Chat History:
<chat_history>
{{MESSAGES:END:6}}
</chat_history>
IMAGE_PROMPT_GENERATION_PROMPT_TEMPLATE: |
/no_think
### Task:
Generate a detailed prompt for am image generation task based on the given language and context. Describe the image as if you were explaining it to someone who cannot see it. Include relevant details, colors, shapes, and any other important elements.
### Guidelines:
- Be descriptive and detailed, focusing on the most important aspects of the image.
- Avoid making assumptions or adding information not present in the image.
- Use the chat's primary language; default to English if multilingual.
- If the image is too complex, focus on the most prominent elements.
### Output:
Strictly return in JSON format:
{
"prompt": "Your detailed description here."
}
### Chat History:
<chat_history>
{{MESSAGES:END:6}}
</chat_history>
RAG_TEMPLATE: |
/no_think
### Task:
Respond to the user query using the provided context, incorporating inline citations in the format [id] **only when the <source> tag includes an explicit id attribute** (e.g., <source id="1">).
### Guidelines:
- If you don't know the answer, clearly state that.
- If uncertain, ask the user for clarification.
- Respond in the same language as the user's query.
- If the context is unreadable or of poor quality, inform the user and provide the best possible answer.
- If the answer isn't present in the context but you possess the knowledge, explain this to the user and provide the answer using your own understanding.
- **Only include inline citations using [id] (e.g., [1], [2]) when the <source> tag includes an id attribute.**
- Do not cite if the <source> tag does not contain an id attribute.
- Do not use XML tags in your response.
- Ensure citations are concise and directly related to the information provided.
### Example of Citation:
If the user asks about a specific topic and the information is found in a source with a provided id attribute, the response should include the citation like in the following example:
* "According to the study, the proposed method increases efficiency by 20% [1]."
### Output:
Provide a clear and direct response to the user's query, including inline citations in the format [id] only when the <source> tag with id attribute is present in the context.
<context>
{{CONTEXT}}
</context>
<user_query>
{{QUERY}}
</user_query>
ollama:
image: 'ollama/ollama:0.6.8@sha256:50ab2378567a62b811a2967759dd91f254864c3495cbe50576bd8a85bc6edd56'
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
volumes:
- type: 'volume'
source: 'ollama'
target: '/root/.ollama'
# comfyui:
# image: 'ghcr.io/ai-dock/comfyui:v2-cuda-12.1.1-base-22.04-v0.2.2@sha256:9f99d5383690f85f3f8eb8ccdde41ca3edfbfaecf41dcf53291741c1e8db297e'
# deploy:
# resources:
# reservations:
# devices:
# - driver: nvidia
# count: 1
# capabilities: [gpu]
# volumes:
# - type: 'volume'
# source: 'comfyui'
# target: '/workspace'
volumes:
open-webui:
ollama:
comfyui:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment