Skip to content

Instantly share code, notes, and snippets.

@idusortus
Created February 10, 2026 19:45
Show Gist options
  • Select an option

  • Save idusortus/32c2c7c1b4189426b547a824946dc28c to your computer and use it in GitHub Desktop.

Select an option

Save idusortus/32c2c7c1b4189426b547a824946dc28c to your computer and use it in GitHub Desktop.
Common MCP questions

MCP Server Infrastructure Reference

A practical reference for understanding how to host, share, and compose MCP servers — especially in combination with self-hosted LLMs — for personal and small-project use.

Last researched: February 2026
Latest MCP spec version: 2025-11-25 (stable)
Spec home: https://modelcontextprotocol.io/specification/2025-11-25
GitHub org: https://github.com/modelcontextprotocol
Governance: MCP is now a Series of LF Projects, LLC (Linux Foundation)


Table of Contents

  1. MCP Protocol Evolution
  2. Transport Standards
  3. Sharing Your MCP Servers
  4. Hosting Options for Remote MCP Servers
  5. Self-Hosted LLMs + MCP
  6. The "AI Service Gateway" Pattern
  7. Key Tools & Projects
  8. Architecture Diagrams
  9. Feasibility Assessment
  10. Getting Started Recipes

1. MCP Protocol Evolution

MCP (Model Context Protocol) was introduced by Anthropic in November 2024 as an open standard for connecting LLM applications to external data and tools. It has evolved rapidly:

Spec Version Date Key Additions
2024-11-05 Nov 2024 Initial release. stdio + HTTP+SSE transports. Basic tools, resources, prompts.
2025-03-26 Mar 2025 Streamable HTTP introduced (replacing HTTP+SSE). Elicitation. Tool annotations.
2025-06-18 Jun 2025 OAuth 2.1 authorization. Structured output for tools. Security hardening.
2025-11-25 Nov 2025 Latest stable. Async Tasks. OpenID Connect Discovery. Icons metadata. Incremental scope consent. URL mode elicitation. Tool calling in sampling. Client ID Metadata Documents. SDK tiering. Formal governance.

What "Standardization" Means Now

  • MCP is governed under the Linux Foundation (LF Projects, LLC) with formal working groups and interest groups.
  • An SDK tiering system has been established with clear requirements for feature support and maintenance.
  • The spec now uses JSON Schema 2020-12 as the default dialect.
  • OAuth 2.1 is the standard auth mechanism for remote servers.

Sources

  • Anthropic: Original creator, continues active development
  • GitHub: https://github.com/modelcontextprotocol — canonical spec and SDKs
  • Microsoft: Major adopter — Azure Functions MCP hosting, VS Code Copilot integration, Azure API Center MCP Registry
  • Google: Cloud Run MCP deployment guides
  • Docker: MCP catalog with 60+ servers, OAuth support built-in

2. Transport Standards

The protocol defines two standard transports (plus custom):

stdio (Local)

  • Client launches MCP server as a subprocess
  • Communication via stdin/stdout (JSON-RPC messages)
  • Best for: local tools, VS Code extensions, CLI integrations
  • This is what you're likely using now with your local MCP servers in VS Code

Streamable HTTP (Remote) — The Current Standard

  • Replaced the older HTTP+SSE transport as of spec 2025-03-26
  • Server exposes a single HTTP endpoint (e.g., https://example.com/mcp)
  • Client sends JSON-RPC messages via HTTP POST
  • Server can respond with application/json (single response) or text/event-stream (SSE stream)
  • Client can GET the endpoint to open an SSE stream for server-initiated messages
  • Supports session management via MCP-Session-Id header
  • Supports resumability via SSE event IDs and Last-Event-ID
  • Protocol version negotiated via MCP-Protocol-Version header

Security requirements for Streamable HTTP:

  • Servers MUST validate the Origin header (prevents DNS rebinding)
  • Local servers SHOULD bind to 127.0.0.1 only (not 0.0.0.0)
  • Servers SHOULD implement proper authentication (OAuth 2.1 recommended)

Backwards Compatibility

  • Servers can host both old HTTP+SSE endpoints and the new Streamable HTTP endpoint simultaneously
  • Clients can auto-detect transport by attempting a POST first, falling back to GET+SSE

3. Sharing Your MCP Servers

You have several options for sharing your HTTP-based MCP servers, ordered from simplest to most involved:

Option A: Share the Code (Easiest)

  • Publish your MCP server source to GitHub/GitLab
  • Others clone and run locally, adding to their own VS Code settings.json or .vscode/mcp.json
  • Include a mcp.json config snippet in your README so consumers can copy-paste
  • This is the most common approach in the MCP community today

Option B: Containerize with Docker

  • Package your MCP server as a Docker image
  • Publish to Docker Hub or GitHub Container Registry (GHCR)
  • Others run it with a single docker run command
  • Docker has a built-in MCP catalog with 60+ servers and OAuth support
  • Example: docker run -p 8080:8080 yourname/your-mcp-server

Option C: Deploy as a Remote MCP Server

  • Host on a cloud platform (see Section 4)
  • Share the URL — anyone with an MCP client can connect
  • Use OAuth 2.1 or API keys for access control
  • This is the "remote MCP server" pattern — the standard is designed for this

Option D: Register in an MCP Registry

  • Azure API Center now supports MCP server registries
  • GitHub Copilot enterprise/org admins can set an MCP registry URL so all org members discover your servers automatically
  • Format: https://{api-center-name}.data.{location}.azure-apicenter.ms/workspaces/default/v0/servers
  • This is currently enterprise/org-level only (not individual user level)

Option E: VS Code Workspace Configuration

  • For team sharing, commit a .vscode/mcp.json file to your repo
  • Or use VS Code Profiles to bundle MCP server configs + settings and share them
  • Others import the profile and get your MCP server configuration automatically

4. Hosting Options for Remote MCP Servers

Azure Functions (Microsoft) — Public Preview

  • Self-hosted MCP servers deploy as Azure Functions custom handlers
  • No code changes needed — your existing MCP server code works as-is
  • Add a host.json with configurationProfile: "mcp-custom-handler"
  • Supports .NET, Java, JavaScript, Python, TypeScript
  • Integrates with Azure AI Foundry agents
  • Can register in Azure API Center for discovery
  • Docs: https://learn.microsoft.com/en-us/azure/azure-functions/self-hosted-mcp-servers
  • The Azure MCP Server repo (github.com/Azure/azure-mcp) is archived — development moved to github.com/microsoft/mcp

Google Cloud Run

  • Deploy your MCP server as a container
  • Use the Cloud Run proxy for authenticated tunnels to your server
  • Full guide available: "Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes"

Cloudflare Workers

  • Deploy as a Cloudflare Worker (edge compute)
  • Uses Cloudflare KV for OAuth token storage
  • Cloudflare Tunnel + Zero Trust for on-premises servers with secure remote access
  • Good for lightweight MCP servers that don't need heavy compute

Docker / Self-Hosted VPS

  • Run anywhere Docker runs (DigitalOcean, Linode, Hetzner, home server, etc.)
  • Use Cloudflare Tunnel or ngrok to expose local servers securely
  • Full control, lowest cost, most flexibility
  • Ideal for your personal learning projects

Kubernetes (AKS, EKS, GKE, self-hosted)

  • Microsoft highlights Kubernetes-native deployment for MCP
  • MCP gateway handles session-aware routing
  • Best for scaling multiple MCP servers

5. Self-Hosted LLMs + MCP

This is where your core question lives: Can you bundle a self-hosted LLM with MCP servers and serve the combined capability via an API?

Yes, absolutely. Here's the landscape:

Ollama

  • Open-source LLM runner, extremely popular for local/self-hosted use
  • Runs models like Llama 3.2, Mistral, Qwen, Phi, DeepSeek, etc.
  • Exposes an OpenAI-compatible REST API at http://localhost:11434
  • Supports tool calling (function calling) — essential for MCP integration
  • Docker-friendly: docker run ollama/ollama

Ollama + MCP Bridge Projects

Project What It Does
ollama-mcp-bridge (jonigl/ollama-mcp-bridge) Drop-in proxy for the Ollama API that transparently adds all MCP server tools to every /api/chat request. Your apps talk to Ollama's API as normal, but MCP tools are automatically available.
MCP-Bridge (SecretiveShell/MCP-Bridge) Exposes MCP tools via an OpenAI-compatible API. Connect your MCP servers + any inference server (Ollama, vLLM, etc.) and get a unified API. Supports API key auth.
mcpo (open-webui/mcpo) Converts any MCP server into a standard RESTful OpenAPI endpoint. Dead simple. uvx mcpo --port 8000 -- python your_mcp_server.py
Open WebUI Full chat UI (like ChatGPT) that connects to Ollama. Supports MCP servers via mcpo bridge.

Open WebUI + Ollama + MCP (Docker Compose Stack)

A popular pattern is running all three together:

# Simplified example — see full configs in project repos
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["8080:8080"]
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
  
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
  
  your-mcp-server:
    build: ./your-mcp-server
    command: ["mcpo", "--port", "8000", "--", "python", "server.py"]
    ports: ["8000:8000"]

Other Self-Hosted LLM Options

Platform Notes
Ollama Easiest to set up. Great model library. Best for personal/learning use.
vLLM High-performance serving. Better throughput for production. OpenAI-compatible API.
llama.cpp / llama-server Lightweight C++ inference. Good for resource-constrained environments.
LocalAI OpenAI-compatible API, supports multiple model formats, has built-in tool calling.
LM Studio Desktop app with server mode. Easy model management. OpenAI-compatible API.
Onyx (formerly Danswer) Enterprise search + RAG platform. Can integrate with MCP for tool access.
Jan Desktop app, self-hosted, OpenAI-compatible API.

6. The "AI Service Gateway" Pattern — Your Use Case

"It would be great if I could encapsulate the functionality of MCP and LLM within a service of sorts and serve it via API so that I don't have to include an LLM in all of my tiny applications."

This is not only feasible — it's becoming a recognized architecture pattern. Here's how it works:

The Pattern

┌─────────────────────────────────────────────┐
│           Your Tiny Applications            │
│  (Web apps, CLI tools, mobile apps, etc.)   │
│                                             │
│  Simple REST calls:                         │
│  POST /api/ask  { "question": "..." }       │
│  POST /api/do   { "action": "...", ... }    │
└──────────────────┬──────────────────────────┘
                   │ HTTP/REST
                   ▼
┌─────────────────────────────────────────────┐
│         Your AI Service Gateway             │
│  (FastAPI / ASP.NET / Express / etc.)       │
│                                             │
│  - Receives plain REST requests             │
│  - Orchestrates LLM + MCP tools             │
│  - Returns structured responses             │
│  - Handles auth, rate limiting, logging     │
└──────┬────────────────────┬─────────────────┘
       │                    │
       ▼                    ▼
┌──────────────┐   ┌──────────────────────┐
│  Self-Hosted │   │  Your MCP Servers    │
│  LLM         │   │                      │
│  (Ollama)    │   │  - Weather tool      │
│              │   │  - DB query tool     │
│  llama3.2    │   │  - File search tool  │
│  mistral     │   │  - Custom tools...   │
│  etc.        │   │                      │
└──────────────┘   └──────────────────────┘

Why This Works Well

  1. Separation of concerns: Your tiny apps stay tiny. They make simple HTTP calls.
  2. Single LLM instance: One Ollama instance serves all your apps. No redundancy.
  3. MCP tools are shared: All your MCP servers are available to the gateway, which decides when to use them.
  4. Model flexibility: Swap LLMs without touching your apps (switch from Llama to Mistral, etc.).
  5. Cost efficiency: One machine running the LLM, many apps consuming it.

Why It's Becoming Popular

  • MCP-Bridge and ollama-mcp-bridge are exactly this pattern packaged as open-source tools
  • The MCP gateway pattern is well-documented and gaining traction (Portkey, Lasso, Kong, Gravitee all offer MCP gateway products)
  • Docker Compose makes it trivial to run the full stack locally
  • For learning and personal projects, this is arguably the ideal architecture

Why It's Not Universal (Yet)

  • Tool calling quality varies by model. Smaller open-source models (7B-13B) are less reliable at deciding when and how to call tools compared to GPT-4 or Claude.
  • Latency: Self-hosted LLMs on consumer hardware are slower than cloud APIs (though fine for personal use).
  • Maintenance burden: You're running and updating the LLM, the gateway, and the MCP servers.
  • Context window limits: Open-source models often have smaller context windows.
  • For production/public apps, most teams still use cloud LLM APIs. For personal/learning use, self-hosted is excellent.

Implementation Approach

The simplest path to your goal:

  1. Run Ollama with a tool-calling-capable model (Llama 3.2, Mistral, Qwen 2.5)
  2. Use MCP-Bridge or ollama-mcp-bridge as your gateway — it already does the hard work of:
    • Connecting to your MCP servers
    • Translating MCP tools into OpenAI-compatible function schemas
    • Running the tool-calling loop (LLM decides to call tool → gateway calls MCP server → result goes back to LLM)
    • Exposing an OpenAI-compatible API your apps can call
  3. Your tiny apps just call the gateway's /v1/chat/completions endpoint — standard OpenAI API format

Or build your own thin gateway:

# Conceptual example (Python + FastAPI + Ollama + FastMCP)
from fastapi import FastAPI
from fastmcp import Client as MCPClient
import ollama

app = FastAPI()

@app.post("/api/ask")
async def ask(question: str):
    # 1. Get available tools from MCP servers
    async with MCPClient("http://localhost:8080/mcp") as mcp:
        tools = await mcp.list_tools()
    
    # 2. Ask LLM with tools available
    response = ollama.chat(
        model="llama3.2",
        messages=[{"role": "user", "content": question}],
        tools=convert_mcp_to_ollama_tools(tools)
    )
    
    # 3. If LLM wants to call a tool, execute it
    if response.message.tool_calls:
        for call in response.message.tool_calls:
            async with MCPClient("http://localhost:8080/mcp") as mcp:
                result = await mcp.call_tool(call.function.name, call.function.arguments)
            # Feed result back to LLM for final answer
            # ... (tool calling loop)
    
    return {"answer": response.message.content}

7. Key Tools & Projects

MCP SDKs (Official)

Language Repo
TypeScript github.com/modelcontextprotocol/typescript-sdk
Python github.com/modelcontextprotocol/python-sdk
.NET (C#) github.com/modelcontextprotocol/csharp-sdk
Java/Kotlin github.com/modelcontextprotocol/java-sdk

Bridges & Gateways

Tool Purpose
mcpo MCP → OpenAPI REST proxy (by Open WebUI team)
MCP-Bridge MCP tools → OpenAI-compatible API
ollama-mcp-bridge Drop-in Ollama proxy with MCP tools
openapi-to-mcp Convert any OpenAPI/Swagger spec into an MCP server
Lasso MCP Gateway Open-source proxy/orchestration layer for multiple MCP servers
MetaMCP Manage and aggregate multiple MCP servers

Platforms with MCP Support

Platform MCP Integration
VS Code + GitHub Copilot Native MCP client. .vscode/mcp.json config.
Claude Desktop Native MCP client (Anthropic)
Cursor Native MCP support
Windsurf Native MCP support
Open WebUI Via mcpo bridge
n8n MCP server for workflow automation
Docker MCP catalog with 60+ servers

8. Architecture Diagrams

Current Setup (Local Only)

┌─────────────┐     stdio/HTTP      ┌──────────────────┐
│   VS Code   │◄───────────────────►│ Your MCP Servers  │
│ + Copilot   │                     │ (local machine)   │
└─────────────┘                     └──────────────────┘

Shared Remote Setup

┌─────────────┐                     ┌──────────────────┐
│   VS Code   │                     │ Your MCP Servers  │
│   (User A)  │◄──Streamable HTTP──►│ (Cloud/VPS)      │
└─────────────┘         ▲           │                  │
                        │           │ OAuth 2.1 auth   │
┌─────────────┐         │           │ Streamable HTTP  │
│   VS Code   │─────────┘           └──────────────────┘
│   (User B)  │
└─────────────┘

Full Self-Hosted AI Service

┌──────────┐ ┌──────────┐ ┌──────────┐
│ App 1    │ │ App 2    │ │ App 3    │
│ (web)    │ │ (CLI)    │ │ (mobile) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
     │            │            │
     └────────────┼────────────┘
                  │  REST API
                  ▼
     ┌────────────────────────┐
     │   AI Service Gateway   │
     │   (MCP-Bridge /        │
     │    ollama-mcp-bridge / │
     │    custom FastAPI)     │
     │                        │
     │  OpenAI-compatible API │
     └───────┬──────┬─────────┘
             │      │
     ┌───────▼──┐ ┌─▼──────────────┐
     │  Ollama  │ │  MCP Servers   │
     │  LLM     │ │  (your tools)  │
     └──────────┘ └────────────────┘

9. Feasibility Assessment

"Can I share my MCP servers?"

Yes. Multiple well-supported paths exist:

  • Source code sharing (simplest)
  • Docker images (portable)
  • Remote hosting with Streamable HTTP (most powerful)
  • MCP registries via Azure API Center (enterprise)

"Can I host my own LLM + MCP as a unified service?"

Yes, very feasible. This is an active and growing pattern:

  • ollama-mcp-bridge and MCP-Bridge are production-quality implementations of exactly this
  • Docker Compose makes the entire stack deployable in one command
  • Tool calling works well with Llama 3.2 (8B+), Mistral, Qwen 2.5

"Is a REST API backed by self-hosted LLM + MCP popular?"

Growing but niche. Here's the honest picture:

  • Very popular in the hobbyist/tinkerer/learning community
  • Common in privacy-sensitive enterprise environments
  • Less common for public-facing products (cloud LLM APIs still dominate there)
  • The MCP gateway pattern (centralized AI service) is rapidly gaining adoption in 2025-2026
  • For your stated use case (learning, small apps, personal enjoyment) — this is ideal

Hardware Considerations for Self-Hosting

Model Size RAM Needed GPU (Optional) Response Speed
1-3B (Phi, Qwen-mini) 4-8 GB Not needed Fast
7-8B (Llama 3.2, Mistral) 8-16 GB 6+ GB VRAM helps Good
13-14B 16-32 GB 8+ GB VRAM Moderate
70B+ 64+ GB 24+ GB VRAM Slow without GPU

For learning and small projects, a 7-8B model on a machine with 16GB RAM works well.


10. Getting Started Recipes

Recipe 1: Share your existing MCP server via Docker

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["python", "server.py"]
docker build -t my-mcp-server .
docker run -p 8080:8080 my-mcp-server

Others add to VS Code settings.json:

{
  "mcp": {
    "servers": {
      "your-server": {
        "url": "http://your-server-address:8080/mcp"
      }
    }
  }
}

Recipe 2: Ollama + MCP-Bridge (Quickest Unified Service)

# 1. Start Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2

# 2. Start MCP-Bridge pointing to Ollama + your MCP servers
# (see github.com/SecretiveShell/MCP-Bridge for full config)
docker run -d -p 8000:8000 \
  -v ./config.json:/app/config.json \
  mcpbridge/mcpbridge

# 3. Your apps call the bridge's OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "What is the weather?"}]}'

Recipe 3: Full Stack with Open WebUI (Chat UI + LLM + MCP)

# See: github.com/open-webui/open-webui
# Uses mcpo to bridge MCP servers into Open WebUI
docker compose up -d  # with the compose file from Section 5

Further Reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment