Skip to content

Instantly share code, notes, and snippets.

@nerdalert
Last active March 15, 2026 04:17
Show Gist options
  • Select an option

  • Save nerdalert/5acd163bf5e71cec1876e6b28254e78d to your computer and use it in GitHub Desktop.

Select an option

Save nerdalert/5acd163bf5e71cec1876e6b28254e78d to your computer and use it in GitHub Desktop.

MaaS Istio External Mode Routing Validation

All three models (local + OpenAI + Anthropic) work through the MaaS gateway using the same sk-oai-* API key minted via the MaaS API.

Demo: External Model Routing with Istio ServiceEntry & DestinationRule

I didn't add the model listing to this validation but you can see an example modifications to MaaS required in egress-ai-gateway-poc/patches/maas-api-external-model-listing.patch. This patch adds ConfigMap-based external model listing to the MaaS API — it reads from an external-model-registry ConfigMap in the MaaS namespace and merges those models into the GET /v1/models response. I have tested that a couple of weeks ago with ghcr.io/nerdalert/maas-api:external-models.

Environment

Component Version / Detail
Platform OpenShift 4.20.6 (ROSA on AWS, us-east-1)
Istio v1.29-latest via Sail Operator 1.29.0
MaaS Deployed via quickstart with sample model
Gateway maas-default-gateway in openshift-ingress
Auth Kuadrant AuthPolicy (MaaS API key -> provider key injection)

Setup

export MAAS_HOST=$(kubectl get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.spec.listeners[0].hostname}')
export MAAS_URL=$(kubectl get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')
export API_KEY=$(curl -sSk -X POST "https://${MAAS_HOST}/maas-api/v1/api-keys" -H "Authorization: Bearer $(oc whoami -t)" -H "Content-Type: application/json" -d '{"name":"test"}' | jq -r '.key')

1. Local Model (facebook/opt-125m)

curl -sk "https://${MAAS_HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" -H "Authorization: Bearer ${API_KEY}" -H "Content-Type: application/json" -d '{"model":"facebook/opt-125m","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
{
  "id": "chatcmpl-450d1d11-7c3d-5aae-a4a7-eea43ea5b250",
  "model": "facebook/opt-125m",
  "choices": [
    {
      "index": 0,
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "To be "
      }
    }
  ],
  "usage": {
    "prompt_tokens": 1,
    "completion_tokens": 2,
    "total_tokens": 3
  }
}

2. OpenAI (gpt-4o-mini)

curl -s "http://${MAAS_URL}/external/openai/v1/chat/completions" -H "Host: ${MAAS_HOST}" -H "Authorization: Bearer ${API_KEY}" -H "Content-Type: application/json" -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}],"max_tokens":5,"temperature":0}'
{
  "id": "chatcmpl-DJD5AMFiNeS3A6dcabz97xyLhJsUL",
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I"
      },
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 5,
    "total_tokens": 13
  },
  "system_fingerprint": "fp_a1681c17ec"
}

3. Anthropic (claude-sonnet-4-20250514)

curl -s "http://${MAAS_URL}/external/anthropic/v1/messages" -H "Host: ${MAAS_HOST}" -H "Authorization: Bearer ${API_KEY}" -H "Content-Type: application/json" -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}],"max_tokens":5}'
{
  "model": "claude-sonnet-4-20250514",
  "id": "msg_01KH2kuMRhB99ZMZ26mkXaiZ",
  "content": [
    {
      "type": "text",
      "text": "Hello! How are you"
    }
  ],
  "stop_reason": "max_tokens",
  "usage": {
    "input_tokens": 8,
    "output_tokens": 5
  }
}

Results

Model Provider Token Response Status
facebook/opt-125m Local (KServe) sk-oai-* "To be " PASS
gpt-4o-mini OpenAI sk-oai-* "Hello! How can I" PASS
claude-sonnet-4-20250514 Anthropic sk-oai-* "Hello! How are you" PASS

All three models use the same MaaS API key. The client never provides the provider API key — the gateway injects it.

Request Flow

1. Client sends:     Authorization: Bearer sk-oai-*  (MaaS API key)

2. AuthPolicy:       Validates API key via maas-api callback
                     Replaces Authorization with provider key

3. HTTPRoute:        Matches path, rewrites URL, sets Host header

4. Istio resources:  ExternalName Svc -> ServiceEntry -> DestinationRule (TLS)

5. Provider:         Receives HTTPS request with valid credentials

Provider Differences

Aspect Local OpenAI Anthropic
Endpoint /llm/.../v1/chat/completions /external/openai/v1/chat/completions /external/anthropic/v1/messages
Protocol HTTPS (in-cluster) HTTP -> HTTPS (TLS origination) HTTP -> HTTPS (TLS origination)
Auth injected N/A (in-cluster) Authorization: Bearer <openai-key> x-api-key: <anthropic-key>
Extra headers None Host: api.openai.com Host: api.anthropic.com, anthropic-version
Istio resources None needed ServiceEntry + DestinationRule + ExternalName Svc Same

Raw Output

export MAAS_HOST=$(kubectl get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.spec.listeners[0].hostname}')
export MAAS_URL=$(kubectl get gateway maas-default-gateway -n openshift-ingress -o jsonpath='{.status.addresses[0].value}')
export API_KEY=$(curl -sSk -X POST "https://${MAAS_HOST}/maas-api/v1/api-keys" -H "Authorization: Bearer $(oc whoami -t)" -H "Content-Type: application/json" -d '{"name":"test"}' | jq -r '.key')

# 1. Local model (facebook/opt-125m)
curl -sk "https://${MAAS_HOST}/llm/facebook-opt-125m-simulated/v1/chat/completions" -H "Authorization: Bearer ${API_KEY}" -H"Content-Type: application/json" -d '{"model":"facebook/opt-125m","messages":[{"role":"user","content":"Hello"}],"max_tokens":10}'
{"id":"chatcmpl-5f547bf0-9f2a-5a75-a41b-2e254521f8b3","created":1773471237,"model":"facebook/opt-125m","usage":{"prompt_tokens":1,"completion_tokens":8,"total_tokens":9},"object":"chat.completion","kv_transfer_params":null,"choices":[{"index":0,"finish_reason":"stop","message":{"role":"assistant","content":"I am your AI assistant, how can "}}]}ubuntu@ip-172-31-33-128:~/mcp/rhoai-observability  # 2. OpenAI (gpt-4o-mini)-mini)

# 2. OpenAI (gpt-4o-mini)
curl -s "http://${MAAS_URL}/external/openai/v1/chat/completions" -H "Host: ${MAAS_HOST}" -H "Authorization: Bearer ${API_KEY}" -H "Content-Type: application/json" -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}],"max_tokens":5}'
{
  "id": "chatcmpl-DJDApznNYlh1O0GtTAqHVnJc4YKJ4",
  "object": "chat.completion",
  "created": 1773471243,
  "model": "gpt-4o-mini-2024-07-18",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I",
        "refusal": null,
        "annotations": []
      },
      "logprobs": null,
      "finish_reason": "length"
    }
  ],
  "usage": {
    "prompt_tokens": 8,
    "completion_tokens": 5,
    "total_tokens": 13,
    "prompt_tokens_details": {
      "cached_tokens": 0,
      "audio_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0,
      "audio_tokens": 0,
      "accepted_prediction_tokens": 0,
      "rejected_prediction_tokens": 0
    }
  },
  "service_tier": "default",
  "system_fingerprint": "fp_a1681c17ec"
}
$ 

# 3. Anthropic (claude-sonnet-4-20250514)
curl -s "http://${MAAS_URL}/external/anthropic/v1/messages" -H "Host: ${MAAS_HOST}" -H "Authorization: Bearer ${API_KEY}" -H "Content-Type: application/json" -d '{"model":"claude-sonnet-4-20250514","messages":[{"role":"user","content":"Hello"}],"max_tokens":5}'
{"model":"claude-sonnet-4-20250514","id":"msg_014kmQoaKKKLhKTM9D3jD3Dx","type":"message","role":"assistant","content":[{"type":"text","text":"Hello! How are you"}],"stop_reason":"max_tokens","stop_sequence":null,"usage":{"input_tokens":8,"cache_creation_input_tokens":0,"cache_read_input_tokens":0,"cache_creation":{"ephemeral_5m_input_tokens":0,"ephemeral_1h_input_tokens":0},"output_tokens":5,"service_tier":"standard","inference_geo":"not_available"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment