Skip to content

Instantly share code, notes, and snippets.

@ruanbekker
Last active February 17, 2025 20:25
Show Gist options
  • Save ruanbekker/4db8c868e5312586eb5967b4e40517b9 to your computer and use it in GitHub Desktop.
Save ruanbekker/4db8c868e5312586eb5967b4e40517b9 to your computer and use it in GitHub Desktop.
Getting Started with OpenWeb-UI and Ollama using Docker Compose.

This gist will show how to get started with OpenWeb-UI and Ollama on docker compose and how to interact with the Web and API to communicate with the models hosted by Ollama.

This is done with only CPU.

Docker Compose

We define 2 services in our compose definition:

  1. Ollama: we will run ollama inside a container and persist the models to disk.
  2. OpenWebUI: we are pointing it to the ollama service and disabling CORS.
services:
  ollama:
    container_name: ollama
    image: ollama/ollama:latest
    restart: unless-stopped
    tty: true
    pull_policy: always
    volumes:
      - ollama:/root/.ollama

  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    container_name: open-webui
    restart: unless-stopped
    ports:
      - 3000:8080
    environment: # https://docs.openwebui.com/getting-started/env-configuration/
      - CORS_ALLOW_ORIGIN=*
      - OLLAMA_BASE_URL=http://ollama:11434
      - OLLAMA_API_BASE_URL=http://ollama:11434
      - WEBUI_SECRET_KEY=
    depends_on:
      - ollama
    volumes:
      - open-webui:/app/backend/data
    extra_hosts:
      - host.docker.internal:host-gateway

volumes:
  ollama: {}
  open-webui: {}

Inside the directory where docker-compose.yaml is run:

docker compose up -d

This will take some time to start initially, you can view the logs using:

docker compose logs -f

And if you see Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit) you can access openweb-ui on port 3000:

image

You will be dropped into the main screen:

image

But there are no models configured yet, we will do this now.

Download Models from Ollama

From your profile, select admin settings:

image

Then select "Settings" and then "Connections":

image

You should see your "Ollama" API Connection, then click the manage icon on the right and you should see this:

image

Now we would like to download the following models:

  • llama3.2
  • gemma:2b
  • mistral:7b

First download llama3.2 by typing it in and download it:

image

This will take some time, but after it finishes, download gemma:2b and mistral:7b as well.

You can view a list of available models and their descriptions here:

Once all models have been downloaded you can close that and move to "Settings" -> "Models", then you should see your models:

image

Test out OpenWeb-UI

Select your model on top, then ask a question to test if everything is working:

image

API Access

To create a API key, we need to go into the user account settings, where you can select the user at the bottom left, then select "Settings" and then "Account":

image

You can create a API key and then copy the value. For this demonstration I will be setting it to a environment variable in my terminal:

export OWU_APIKEY=sk-a1f7fxoxoxoxoxoxoxoxoxoad72

Then we can consult the openweb-ui documentation for their API Endpoints: https://docs.openwebui.com/getting-started/api-endpoints

Interact with OpenWeb-UI API

And a quick test is to use the "get models" api endpoint, to view all the models that we created:

curl -H "Authorization: Bearer $OWU_APIKEY" http://localhost:3000/api/models

And a filtered response will look like this:

{
  "data": [
    {
      "id": "mistral:7b",
      "name": "mistral:7b",
      "object": "model",
      "created": 1739805581,
      "owned_by": "ollama",
      "ollama": {
        "name": "mistral:7b",
        "model": "mistral:7b",
        "modified_at": "2025-02-17T15:07:12.854653452Z",
        "size": 4113301824,
        "digest": "f974a74358d62a017b37c6f424fcdf2744ca02926c4f952513ddf474b2fa5091",
        "details": {
          "parent_model": "",
          "format": "gguf",
          "family": "llama",
          "families": [
            "llama"
          ],
          "parameter_size": "7.2B",
          "quantization_level": "Q4_0"
        },
        "urls": [
          0
        ]
      },
      "actions": []
    },
    {
      "id": "gemma:2b",
      "name": "gemma:2b",
      "object": "model",
      "...": ""
    },
    {
      "id": "llama3.2:latest",
      "name": "llama3.2:latest",
      "object": "model",
      "...": ""
    },
    {
      "id": "arena-model",
      "name": "Arena Model",
      "...": ""
    }
  ]
}

Then we can also do a POST request to /api/chat/completions which is a chat completion endpoint for one of the provided models:

curl -s -XPOST \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $OWU_APIKEY" \
  http://localhost:3000/api/chat/completions \
  -d '
  {
    "model": "gemma:2b", 
    "messages": [
      {"role": "user", "content": "what is the capital of australia?"}
    ]
  }'

And the response:

{
  "id": "gemma:2b-61b3aed1-77cf-4e62-86d4-b058bf9af5fd",
  "created": 1739805878,
  "model": "gemma:2b",
  "choices": [
    {
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop",
      "message": {
        "content": "The capital of Australia is Canberra. It is the political, economic, and administrative center of Australia.",
        "role": "assistant"
      }
    }
  ],
  "object": "chat.completion",
  "usage": {
    "response_token/s": 10,
    "prompt_token/s": 39.94,
    "total_duration": 4722390078,
    "load_duration": 1892429681,
    "prompt_eval_count": 29,
    "prompt_eval_duration": 726000000,
    "eval_count": 21,
    "eval_duration": 2101000000,
    "approximate_total": "0h0m4s"
  }
}

Python Requests with API

>>> import requests
>>> headers = {"content-type": "application/json", "Authorization": "Bearer sk-a1f7fdxxxxxxxxxxxxxxxad72"}
>>> request_body = {"model":"gemma:2b", "messages": [{"role": "user", "content": "what is the capital of australia?"}]}
>>> response = requests.post("http://localhost:3001/api/chat/completions", headers=headers, json=request_body)

>>> response.status_code
200
>>> response.json()
{'id': 'gemma:2b-8aa600a9-4de5-423d-b906-25ce41693324', 'created': 1739823542, 'model': 'gemma:2b', 'choices': [{'index': 0, 'logprobs': None, 'finish_reason': 'stop', 'message': {'content': 'The capital of Australia is Canberra. It is a city in the Australian Capital Territory, which is a self-governing territory within the Commonwealth of Australia.', 'role': 'assistant'}}], 'object': 'chat.completion', 'usage': {'response_token/s': 9.62, 'prompt_token/s': 40.45, 'total_duration': 5660808687, 'load_duration': 1615716699, 'prompt_eval_count': 29, 'prompt_eval_duration': 717000000, 'eval_count': 32, 'eval_duration': 3325000000, 'approximate_total': '0h0m5s'}}

>>> response.json().get('choices')[0].get('message').get('content')
'The capital of Australia is Canberra. It is a city in the Australian Capital Territory, which is a self-governing territory within the Commonwealth of Australia.'

Ollama Container

We can use ollama cli commands within the container:

docker exec -it ollama sh -c 'ollama list'

Which will show:

NAME               ID              SIZE      MODIFIED
mistral:7b         f974a74358d6    4.1 GB    30 minutes ago
gemma:2b           b50d6c999e59    1.7 GB    43 minutes ago
llama3.2:latest    a80c4f17acd5    2.0 GB    47 minutes ago
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment