MCP Server Infrastructure Reference

A practical reference for understanding how to host, share, and compose MCP servers — especially in combination with self-hosted LLMs — for personal and small-project use.

Last researched: February 2026
Latest MCP spec version: 2025-11-25 (stable)
Spec home: https://modelcontextprotocol.io/specification/2025-11-25
GitHub org: https://github.com/modelcontextprotocol
Governance: MCP is now a Series of LF Projects, LLC (Linux Foundation)

MCP Protocol Evolution
Transport Standards
Sharing Your MCP Servers
Hosting Options for Remote MCP Servers
Self-Hosted LLMs + MCP
The "AI Service Gateway" Pattern
Key Tools & Projects
Architecture Diagrams
Feasibility Assessment
Getting Started Recipes

1. MCP Protocol Evolution

MCP (Model Context Protocol) was introduced by Anthropic in November 2024 as an open standard for connecting LLM applications to external data and tools. It has evolved rapidly:

Spec Version	Date	Key Additions
`2024-11-05`	Nov 2024	Initial release. stdio + HTTP+SSE transports. Basic tools, resources, prompts.
`2025-03-26`	Mar 2025	Streamable HTTP introduced (replacing HTTP+SSE). Elicitation. Tool annotations.
`2025-06-18`	Jun 2025	OAuth 2.1 authorization. Structured output for tools. Security hardening.
`2025-11-25`	Nov 2025	Latest stable. Async Tasks. OpenID Connect Discovery. Icons metadata. Incremental scope consent. URL mode elicitation. Tool calling in sampling. Client ID Metadata Documents. SDK tiering. Formal governance.

What "Standardization" Means Now

MCP is governed under the Linux Foundation (LF Projects, LLC) with formal working groups and interest groups.
An SDK tiering system has been established with clear requirements for feature support and maintenance.
The spec now uses JSON Schema 2020-12 as the default dialect.
OAuth 2.1 is the standard auth mechanism for remote servers.

Sources

Anthropic: Original creator, continues active development
GitHub: https://github.com/modelcontextprotocol — canonical spec and SDKs
Microsoft: Major adopter — Azure Functions MCP hosting, VS Code Copilot integration, Azure API Center MCP Registry
Google: Cloud Run MCP deployment guides
Docker: MCP catalog with 60+ servers, OAuth support built-in

2. Transport Standards

The protocol defines two standard transports (plus custom):

stdio (Local)

Client launches MCP server as a subprocess
Communication via stdin/stdout (JSON-RPC messages)
Best for: local tools, VS Code extensions, CLI integrations
This is what you're likely using now with your local MCP servers in VS Code

Streamable HTTP (Remote) — The Current Standard

Replaced the older HTTP+SSE transport as of spec 2025-03-26
Server exposes a single HTTP endpoint (e.g., https://example.com/mcp)
Client sends JSON-RPC messages via HTTP POST
Server can respond with application/json (single response) or text/event-stream (SSE stream)
Client can GET the endpoint to open an SSE stream for server-initiated messages
Supports session management via MCP-Session-Id header
Supports resumability via SSE event IDs and Last-Event-ID
Protocol version negotiated via MCP-Protocol-Version header

Security requirements for Streamable HTTP:

Servers MUST validate the Origin header (prevents DNS rebinding)
Local servers SHOULD bind to 127.0.0.1 only (not 0.0.0.0)
Servers SHOULD implement proper authentication (OAuth 2.1 recommended)

Backwards Compatibility

Servers can host both old HTTP+SSE endpoints and the new Streamable HTTP endpoint simultaneously
Clients can auto-detect transport by attempting a POST first, falling back to GET+SSE

3. Sharing Your MCP Servers

You have several options for sharing your HTTP-based MCP servers, ordered from simplest to most involved:

Option A: Share the Code (Easiest)

Publish your MCP server source to GitHub/GitLab
Others clone and run locally, adding to their own VS Code settings.json or .vscode/mcp.json
Include a mcp.json config snippet in your README so consumers can copy-paste
This is the most common approach in the MCP community today

Option B: Containerize with Docker

Package your MCP server as a Docker image
Publish to Docker Hub or GitHub Container Registry (GHCR)
Others run it with a single docker run command
Docker has a built-in MCP catalog with 60+ servers and OAuth support
Example: docker run -p 8080:8080 yourname/your-mcp-server

Option C: Deploy as a Remote MCP Server

Host on a cloud platform (see Section 4)
Share the URL — anyone with an MCP client can connect
Use OAuth 2.1 or API keys for access control
This is the "remote MCP server" pattern — the standard is designed for this

Option D: Register in an MCP Registry

Azure API Center now supports MCP server registries
GitHub Copilot enterprise/org admins can set an MCP registry URL so all org members discover your servers automatically
Format: https://{api-center-name}.data.{location}.azure-apicenter.ms/workspaces/default/v0/servers
This is currently enterprise/org-level only (not individual user level)

Option E: VS Code Workspace Configuration

For team sharing, commit a .vscode/mcp.json file to your repo
Or use VS Code Profiles to bundle MCP server configs + settings and share them
Others import the profile and get your MCP server configuration automatically

4. Hosting Options for Remote MCP Servers

Azure Functions (Microsoft) — Public Preview

Self-hosted MCP servers deploy as Azure Functions custom handlers
No code changes needed — your existing MCP server code works as-is
Add a host.json with configurationProfile: "mcp-custom-handler"
Supports .NET, Java, JavaScript, Python, TypeScript
Integrates with Azure AI Foundry agents
Can register in Azure API Center for discovery
Docs: https://learn.microsoft.com/en-us/azure/azure-functions/self-hosted-mcp-servers
The Azure MCP Server repo (github.com/Azure/azure-mcp) is archived — development moved to github.com/microsoft/mcp

Google Cloud Run

Deploy your MCP server as a container
Use the Cloud Run proxy for authenticated tunnels to your server
Full guide available: "Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes"

Cloudflare Workers

Deploy as a Cloudflare Worker (edge compute)
Uses Cloudflare KV for OAuth token storage
Cloudflare Tunnel + Zero Trust for on-premises servers with secure remote access
Good for lightweight MCP servers that don't need heavy compute

Docker / Self-Hosted VPS

Run anywhere Docker runs (DigitalOcean, Linode, Hetzner, home server, etc.)
Use Cloudflare Tunnel or ngrok to expose local servers securely
Full control, lowest cost, most flexibility
Ideal for your personal learning projects

Kubernetes (AKS, EKS, GKE, self-hosted)

Microsoft highlights Kubernetes-native deployment for MCP
MCP gateway handles session-aware routing
Best for scaling multiple MCP servers

5. Self-Hosted LLMs + MCP

This is where your core question lives: Can you bundle a self-hosted LLM with MCP servers and serve the combined capability via an API?

Yes, absolutely. Here's the landscape:

Ollama

Open-source LLM runner, extremely popular for local/self-hosted use
Runs models like Llama 3.2, Mistral, Qwen, Phi, DeepSeek, etc.
Exposes an OpenAI-compatible REST API at http://localhost:11434
Supports tool calling (function calling) — essential for MCP integration
Docker-friendly: docker run ollama/ollama

Ollama + MCP Bridge Projects

Project	What It Does
ollama-mcp-bridge (`jonigl/ollama-mcp-bridge`)	Drop-in proxy for the Ollama API that transparently adds all MCP server tools to every `/api/chat` request. Your apps talk to Ollama's API as normal, but MCP tools are automatically available.
MCP-Bridge (`SecretiveShell/MCP-Bridge`)	Exposes MCP tools via an OpenAI-compatible API. Connect your MCP servers + any inference server (Ollama, vLLM, etc.) and get a unified API. Supports API key auth.
mcpo (`open-webui/mcpo`)	Converts any MCP server into a standard RESTful OpenAPI endpoint. Dead simple. `uvx mcpo --port 8000 -- python your_mcp_server.py`
Open WebUI	Full chat UI (like ChatGPT) that connects to Ollama. Supports MCP servers via mcpo bridge.

Open WebUI + Ollama + MCP (Docker Compose Stack)

A popular pattern is running all three together:

# Simplified example — see full configs in project repos
services:
  open-webui:
    image: ghcr.io/open-webui/open-webui:main
    ports: ["8080:8080"]
    environment:
      - OLLAMA_BASE_URL=http://ollama:11434
  
  ollama:
    image: ollama/ollama:latest
    ports: ["11434:11434"]
  
  your-mcp-server:
    build: ./your-mcp-server
    command: ["mcpo", "--port", "8000", "--", "python", "server.py"]
    ports: ["8000:8000"]

Other Self-Hosted LLM Options

Platform	Notes
Ollama	Easiest to set up. Great model library. Best for personal/learning use.
vLLM	High-performance serving. Better throughput for production. OpenAI-compatible API.
llama.cpp / llama-server	Lightweight C++ inference. Good for resource-constrained environments.
LocalAI	OpenAI-compatible API, supports multiple model formats, has built-in tool calling.
LM Studio	Desktop app with server mode. Easy model management. OpenAI-compatible API.
Onyx (formerly Danswer)	Enterprise search + RAG platform. Can integrate with MCP for tool access.
Jan	Desktop app, self-hosted, OpenAI-compatible API.

6. The "AI Service Gateway" Pattern — Your Use Case

"It would be great if I could encapsulate the functionality of MCP and LLM within a service of sorts and serve it via API so that I don't have to include an LLM in all of my tiny applications."

This is not only feasible — it's becoming a recognized architecture pattern. Here's how it works:

The Pattern

┌─────────────────────────────────────────────┐
│           Your Tiny Applications            │
│  (Web apps, CLI tools, mobile apps, etc.)   │
│                                             │
│  Simple REST calls:                         │
│  POST /api/ask  { "question": "..." }       │
│  POST /api/do   { "action": "...", ... }    │
└──────────────────┬──────────────────────────┘
                   │ HTTP/REST
                   ▼
┌─────────────────────────────────────────────┐
│         Your AI Service Gateway             │
│  (FastAPI / ASP.NET / Express / etc.)       │
│                                             │
│  - Receives plain REST requests             │
│  - Orchestrates LLM + MCP tools             │
│  - Returns structured responses             │
│  - Handles auth, rate limiting, logging     │
└──────┬────────────────────┬─────────────────┘
       │                    │
       ▼                    ▼
┌──────────────┐   ┌──────────────────────┐
│  Self-Hosted │   │  Your MCP Servers    │
│  LLM         │   │                      │
│  (Ollama)    │   │  - Weather tool      │
│              │   │  - DB query tool     │
│  llama3.2    │   │  - File search tool  │
│  mistral     │   │  - Custom tools...   │
│  etc.        │   │                      │
└──────────────┘   └──────────────────────┘

Why This Works Well

Separation of concerns: Your tiny apps stay tiny. They make simple HTTP calls.
Single LLM instance: One Ollama instance serves all your apps. No redundancy.
MCP tools are shared: All your MCP servers are available to the gateway, which decides when to use them.
Model flexibility: Swap LLMs without touching your apps (switch from Llama to Mistral, etc.).
Cost efficiency: One machine running the LLM, many apps consuming it.

Why It's Becoming Popular

MCP-Bridge and ollama-mcp-bridge are exactly this pattern packaged as open-source tools
The MCP gateway pattern is well-documented and gaining traction (Portkey, Lasso, Kong, Gravitee all offer MCP gateway products)
Docker Compose makes it trivial to run the full stack locally
For learning and personal projects, this is arguably the ideal architecture

Why It's Not Universal (Yet)

Tool calling quality varies by model. Smaller open-source models (7B-13B) are less reliable at deciding when and how to call tools compared to GPT-4 or Claude.
Latency: Self-hosted LLMs on consumer hardware are slower than cloud APIs (though fine for personal use).
Maintenance burden: You're running and updating the LLM, the gateway, and the MCP servers.
Context window limits: Open-source models often have smaller context windows.
For production/public apps, most teams still use cloud LLM APIs. For personal/learning use, self-hosted is excellent.

Implementation Approach

The simplest path to your goal:

Run Ollama with a tool-calling-capable model (Llama 3.2, Mistral, Qwen 2.5)
Use MCP-Bridge or ollama-mcp-bridge as your gateway — it already does the hard work of:
- Connecting to your MCP servers
- Translating MCP tools into OpenAI-compatible function schemas
- Running the tool-calling loop (LLM decides to call tool → gateway calls MCP server → result goes back to LLM)
- Exposing an OpenAI-compatible API your apps can call
Your tiny apps just call the gateway's /v1/chat/completions endpoint — standard OpenAI API format

Or build your own thin gateway:

# Conceptual example (Python + FastAPI + Ollama + FastMCP)
from fastapi import FastAPI
from fastmcp import Client as MCPClient
import ollama

app = FastAPI()

@app.post("/api/ask")
async def ask(question: str):
    # 1. Get available tools from MCP servers
    async with MCPClient("http://localhost:8080/mcp") as mcp:
        tools = await mcp.list_tools()
    
    # 2. Ask LLM with tools available
    response = ollama.chat(
        model="llama3.2",
        messages=[{"role": "user", "content": question}],
        tools=convert_mcp_to_ollama_tools(tools)
    )
    
    # 3. If LLM wants to call a tool, execute it
    if response.message.tool_calls:
        for call in response.message.tool_calls:
            async with MCPClient("http://localhost:8080/mcp") as mcp:
                result = await mcp.call_tool(call.function.name, call.function.arguments)
            # Feed result back to LLM for final answer
            # ... (tool calling loop)
    
    return {"answer": response.message.content}

7. Key Tools & Projects

MCP SDKs (Official)

Language	Repo
TypeScript	`github.com/modelcontextprotocol/typescript-sdk`
Python	`github.com/modelcontextprotocol/python-sdk`
.NET (C#)	`github.com/modelcontextprotocol/csharp-sdk`
Java/Kotlin	`github.com/modelcontextprotocol/java-sdk`

Bridges & Gateways

Tool	Purpose
mcpo	MCP → OpenAPI REST proxy (by Open WebUI team)
MCP-Bridge	MCP tools → OpenAI-compatible API
ollama-mcp-bridge	Drop-in Ollama proxy with MCP tools
openapi-to-mcp	Convert any OpenAPI/Swagger spec into an MCP server
Lasso MCP Gateway	Open-source proxy/orchestration layer for multiple MCP servers
MetaMCP	Manage and aggregate multiple MCP servers

Platforms with MCP Support

Platform	MCP Integration
VS Code + GitHub Copilot	Native MCP client. `.vscode/mcp.json` config.
Claude Desktop	Native MCP client (Anthropic)
Cursor	Native MCP support
Windsurf	Native MCP support
Open WebUI	Via mcpo bridge
n8n	MCP server for workflow automation
Docker	MCP catalog with 60+ servers

8. Architecture Diagrams

Current Setup (Local Only)

┌─────────────┐     stdio/HTTP      ┌──────────────────┐
│   VS Code   │◄───────────────────►│ Your MCP Servers  │
│ + Copilot   │                     │ (local machine)   │
└─────────────┘                     └──────────────────┘

Shared Remote Setup

┌─────────────┐                     ┌──────────────────┐
│   VS Code   │                     │ Your MCP Servers  │
│   (User A)  │◄──Streamable HTTP──►│ (Cloud/VPS)      │
└─────────────┘         ▲           │                  │
                        │           │ OAuth 2.1 auth   │
┌─────────────┐         │           │ Streamable HTTP  │
│   VS Code   │─────────┘           └──────────────────┘
│   (User B)  │
└─────────────┘

Full Self-Hosted AI Service

┌──────────┐ ┌──────────┐ ┌──────────┐
│ App 1    │ │ App 2    │ │ App 3    │
│ (web)    │ │ (CLI)    │ │ (mobile) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
     │            │            │
     └────────────┼────────────┘
                  │  REST API
                  ▼
     ┌────────────────────────┐
     │   AI Service Gateway   │
     │   (MCP-Bridge /        │
     │    ollama-mcp-bridge / │
     │    custom FastAPI)     │
     │                        │
     │  OpenAI-compatible API │
     └───────┬──────┬─────────┘
             │      │
     ┌───────▼──┐ ┌─▼──────────────┐
     │  Ollama  │ │  MCP Servers   │
     │  LLM     │ │  (your tools)  │
     └──────────┘ └────────────────┘

9. Feasibility Assessment

"Can I share my MCP servers?"

Yes. Multiple well-supported paths exist:

Source code sharing (simplest)
Docker images (portable)
Remote hosting with Streamable HTTP (most powerful)
MCP registries via Azure API Center (enterprise)

"Can I host my own LLM + MCP as a unified service?"

Yes, very feasible. This is an active and growing pattern:

ollama-mcp-bridge and MCP-Bridge are production-quality implementations of exactly this
Docker Compose makes the entire stack deployable in one command
Tool calling works well with Llama 3.2 (8B+), Mistral, Qwen 2.5

"Is a REST API backed by self-hosted LLM + MCP popular?"

Growing but niche. Here's the honest picture:

Very popular in the hobbyist/tinkerer/learning community
Common in privacy-sensitive enterprise environments
Less common for public-facing products (cloud LLM APIs still dominate there)
The MCP gateway pattern (centralized AI service) is rapidly gaining adoption in 2025-2026
For your stated use case (learning, small apps, personal enjoyment) — this is ideal

Hardware Considerations for Self-Hosting

Model Size	RAM Needed	GPU (Optional)	Response Speed
1-3B (Phi, Qwen-mini)	4-8 GB	Not needed	Fast
7-8B (Llama 3.2, Mistral)	8-16 GB	6+ GB VRAM helps	Good
13-14B	16-32 GB	8+ GB VRAM	Moderate
70B+	64+ GB	24+ GB VRAM	Slow without GPU

For learning and small projects, a 7-8B model on a machine with 16GB RAM works well.

10. Getting Started Recipes

Recipe 1: Share your existing MCP server via Docker

FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["python", "server.py"]

docker build -t my-mcp-server .
docker run -p 8080:8080 my-mcp-server

Others add to VS Code settings.json:

{
  "mcp": {
    "servers": {
      "your-server": {
        "url": "http://your-server-address:8080/mcp"
      }
    }
  }
}

Recipe 2: Ollama + MCP-Bridge (Quickest Unified Service)

# 1. Start Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2

# 2. Start MCP-Bridge pointing to Ollama + your MCP servers
# (see github.com/SecretiveShell/MCP-Bridge for full config)
docker run -d -p 8000:8000 \
  -v ./config.json:/app/config.json \
  mcpbridge/mcpbridge

# 3. Your apps call the bridge's OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2", "messages": [{"role": "user", "content": "What is the weather?"}]}'

Recipe 3: Full Stack with Open WebUI (Chat UI + LLM + MCP)

# See: github.com/open-webui/open-webui
# Uses mcpo to bridge MCP servers into Open WebUI
docker compose up -d  # with the compose file from Section 5

idusortus/MCPMEASAP.md