Skip to content

Instantly share code, notes, and snippets.

@iguatemigarin
Last active July 11, 2024 12:48
Show Gist options
  • Save iguatemigarin/2ee076f4bca4aad266ee336fbc08eefc to your computer and use it in GitHub Desktop.
Save iguatemigarin/2ee076f4bca4aad266ee336fbc08eefc to your computer and use it in GitHub Desktop.
  1. Install ollama

    brew install ollama
  2. Start ollama

    We'll start ollama using the server mode, this way we have an HTTP interface ready to be consumed by whoever wants.

    OLLAMA_MAX_LOADED_MODELS=4 \
    OLLAMA_MAX_NUM_PARALLEL=10 \
    OLLAMA_ORIGINS="\*" \
    OLLAMA_FLASH_ATTENTION=1 \
    ollama serve
    OUTPUT
    # output
    2024/06/28 15:48:32 routes.go:1064: INFO server config env="map[OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:true OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE: OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:4 OLLAMA_MAX_QUEUE:512 OLLAMA_MAX_VRAM:0 OLLAMA_MODELS:/Users/iguatemi/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[\\* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR:]"
    time=2024-06-28T15:48:32.326+01:00 level=INFO source=images.go:730 msg="total blobs: 29"
    time=2024-06-28T15:48:32.328+01:00 level=INFO source=images.go:737 msg="total unused blobs removed: 0"
    [GIN-debug] [WARNING] Creating an Engine instance with the Logger and Recovery middleware already attached.
    
    [GIN-debug] [WARNING] Running in "debug" mode. Switch to "release" mode in production.
    - using env:	export GIN_MODE=release
    - using code:	gin.SetMode(gin.ReleaseMode)
    
    [GIN-debug] POST   /api/pull                 --> github.com/ollama/ollama/server.(*Server).PullModelHandler-fm (5 handlers)
    [GIN-debug] POST   /api/generate             --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (5 handlers)
    [GIN-debug] POST   /api/chat                 --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (5 handlers)
    [GIN-debug] POST   /api/embeddings           --> github.com/ollama/ollama/server.(*Server).EmbeddingsHandler-fm (5 handlers)
    [GIN-debug] POST   /api/create               --> github.com/ollama/ollama/server.(*Server).CreateModelHandler-fm (5 handlers)
    [GIN-debug] POST   /api/push                 --> github.com/ollama/ollama/server.(*Server).PushModelHandler-fm (5 handlers)
    [GIN-debug] POST   /api/copy                 --> github.com/ollama/ollama/server.(*Server).CopyModelHandler-fm (5 handlers)
    [GIN-debug] DELETE /api/delete               --> github.com/ollama/ollama/server.(*Server).DeleteModelHandler-fm (5 handlers)
    [GIN-debug] POST   /api/show                 --> github.com/ollama/ollama/server.(*Server).ShowModelHandler-fm (5 handlers)
    [GIN-debug] POST   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).CreateBlobHandler-fm (5 handlers)
    [GIN-debug] HEAD   /api/blobs/:digest        --> github.com/ollama/ollama/server.(*Server).HeadBlobHandler-fm (5 handlers)
    [GIN-debug] GET    /api/ps                   --> github.com/ollama/ollama/server.(*Server).ProcessHandler-fm (5 handlers)
    [GIN-debug] POST   /v1/chat/completions      --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
    [GIN-debug] GET    /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
    [GIN-debug] GET    /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
    [GIN-debug] GET    /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
    [GIN-debug] HEAD   /                         --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func1 (5 handlers)
    [GIN-debug] HEAD   /api/tags                 --> github.com/ollama/ollama/server.(*Server).ListModelsHandler-fm (5 handlers)
    [GIN-debug] HEAD   /api/version              --> github.com/ollama/ollama/server.(*Server).GenerateRoutes.func2 (5 handlers)
    time=2024-06-28T15:48:32.330+01:00 level=INFO source=routes.go:1111 msg="Listening on 127.0.0.1:11434 (version 0.1.46)"
    time=2024-06-28T15:48:32.331+01:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/var/folders/rw/0syygy1528v0515md9pnvt9r0000gp/T/ollama1714798501/runners
    time=2024-06-28T15:48:32.357+01:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [metal]"
    time=2024-06-28T15:48:32.403+01:00 level=INFO source=types.go:98 msg="inference compute" id=0 library=metal compute="" driver=0.0 name="" total="21.3 GiB" available="21.3 GiB"
    ...

    You can check all available configurations by running ❯ ollama serve --help

  3. Test it

    Ollama pulls LLM images automatically for us. You can browse available models at ollama.com/library

    ollama run llama3 Hello, world!
    OUTPUT
    # output
    pulling manifest
    pulling 6a0746a1ec1a... 100% ▕██████████████▏ 4.7 GB
    pulling 4fa551d4f938... 100% ▕██████████████▏  12 KB
    pulling 8ab4849b038c... 100% ▕██████████████▏  254 B
    pulling 577073ffcc6c... 100% ▕██████████████▏  110 B
    pulling 3f8eb4da87fa... 100% ▕██████████████▏  485 B
    verifying sha256 digest
    writing manifest
    removing any unused layers
    success
    Hello there! It's great to meet you!
  4. In vscode or jetbrains, you can use the Continue extension.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment