A practical reference for understanding how to host, share, and compose MCP servers — especially in combination with self-hosted LLMs — for personal and small-project use.
Last researched: February 2026
Latest MCP spec version:2025-11-25(stable)
Spec home: https://modelcontextprotocol.io/specification/2025-11-25
GitHub org: https://github.com/modelcontextprotocol
Governance: MCP is now a Series of LF Projects, LLC (Linux Foundation)
- MCP Protocol Evolution
- Transport Standards
- Sharing Your MCP Servers
- Hosting Options for Remote MCP Servers
- Self-Hosted LLMs + MCP
- The "AI Service Gateway" Pattern
- Key Tools & Projects
- Architecture Diagrams
- Feasibility Assessment
- Getting Started Recipes
MCP (Model Context Protocol) was introduced by Anthropic in November 2024 as an open standard for connecting LLM applications to external data and tools. It has evolved rapidly:
| Spec Version | Date | Key Additions |
|---|---|---|
2024-11-05 |
Nov 2024 | Initial release. stdio + HTTP+SSE transports. Basic tools, resources, prompts. |
2025-03-26 |
Mar 2025 | Streamable HTTP introduced (replacing HTTP+SSE). Elicitation. Tool annotations. |
2025-06-18 |
Jun 2025 | OAuth 2.1 authorization. Structured output for tools. Security hardening. |
2025-11-25 |
Nov 2025 | Latest stable. Async Tasks. OpenID Connect Discovery. Icons metadata. Incremental scope consent. URL mode elicitation. Tool calling in sampling. Client ID Metadata Documents. SDK tiering. Formal governance. |
- MCP is governed under the Linux Foundation (LF Projects, LLC) with formal working groups and interest groups.
- An SDK tiering system has been established with clear requirements for feature support and maintenance.
- The spec now uses JSON Schema 2020-12 as the default dialect.
- OAuth 2.1 is the standard auth mechanism for remote servers.
- Anthropic: Original creator, continues active development
- GitHub: https://github.com/modelcontextprotocol — canonical spec and SDKs
- Microsoft: Major adopter — Azure Functions MCP hosting, VS Code Copilot integration, Azure API Center MCP Registry
- Google: Cloud Run MCP deployment guides
- Docker: MCP catalog with 60+ servers, OAuth support built-in
The protocol defines two standard transports (plus custom):
- Client launches MCP server as a subprocess
- Communication via
stdin/stdout(JSON-RPC messages) - Best for: local tools, VS Code extensions, CLI integrations
- This is what you're likely using now with your local MCP servers in VS Code
- Replaced the older HTTP+SSE transport as of spec
2025-03-26 - Server exposes a single HTTP endpoint (e.g.,
https://example.com/mcp) - Client sends JSON-RPC messages via HTTP POST
- Server can respond with
application/json(single response) ortext/event-stream(SSE stream) - Client can GET the endpoint to open an SSE stream for server-initiated messages
- Supports session management via
MCP-Session-Idheader - Supports resumability via SSE event IDs and
Last-Event-ID - Protocol version negotiated via
MCP-Protocol-Versionheader
Security requirements for Streamable HTTP:
- Servers MUST validate the
Originheader (prevents DNS rebinding) - Local servers SHOULD bind to
127.0.0.1only (not0.0.0.0) - Servers SHOULD implement proper authentication (OAuth 2.1 recommended)
- Servers can host both old HTTP+SSE endpoints and the new Streamable HTTP endpoint simultaneously
- Clients can auto-detect transport by attempting a POST first, falling back to GET+SSE
You have several options for sharing your HTTP-based MCP servers, ordered from simplest to most involved:
- Publish your MCP server source to GitHub/GitLab
- Others clone and run locally, adding to their own VS Code
settings.jsonor.vscode/mcp.json - Include a
mcp.jsonconfig snippet in your README so consumers can copy-paste - This is the most common approach in the MCP community today
- Package your MCP server as a Docker image
- Publish to Docker Hub or GitHub Container Registry (GHCR)
- Others run it with a single
docker runcommand - Docker has a built-in MCP catalog with 60+ servers and OAuth support
- Example:
docker run -p 8080:8080 yourname/your-mcp-server
- Host on a cloud platform (see Section 4)
- Share the URL — anyone with an MCP client can connect
- Use OAuth 2.1 or API keys for access control
- This is the "remote MCP server" pattern — the standard is designed for this
- Azure API Center now supports MCP server registries
- GitHub Copilot enterprise/org admins can set an MCP registry URL so all org members discover your servers automatically
- Format:
https://{api-center-name}.data.{location}.azure-apicenter.ms/workspaces/default/v0/servers - This is currently enterprise/org-level only (not individual user level)
- For team sharing, commit a
.vscode/mcp.jsonfile to your repo - Or use VS Code Profiles to bundle MCP server configs + settings and share them
- Others import the profile and get your MCP server configuration automatically
- Self-hosted MCP servers deploy as Azure Functions custom handlers
- No code changes needed — your existing MCP server code works as-is
- Add a
host.jsonwithconfigurationProfile: "mcp-custom-handler" - Supports .NET, Java, JavaScript, Python, TypeScript
- Integrates with Azure AI Foundry agents
- Can register in Azure API Center for discovery
- Docs: https://learn.microsoft.com/en-us/azure/azure-functions/self-hosted-mcp-servers
- The Azure MCP Server repo (
github.com/Azure/azure-mcp) is archived — development moved togithub.com/microsoft/mcp
- Deploy your MCP server as a container
- Use the Cloud Run proxy for authenticated tunnels to your server
- Full guide available: "Build and Deploy a Remote MCP Server to Google Cloud Run in Under 10 Minutes"
- Deploy as a Cloudflare Worker (edge compute)
- Uses Cloudflare KV for OAuth token storage
- Cloudflare Tunnel + Zero Trust for on-premises servers with secure remote access
- Good for lightweight MCP servers that don't need heavy compute
- Run anywhere Docker runs (DigitalOcean, Linode, Hetzner, home server, etc.)
- Use Cloudflare Tunnel or ngrok to expose local servers securely
- Full control, lowest cost, most flexibility
- Ideal for your personal learning projects
- Microsoft highlights Kubernetes-native deployment for MCP
- MCP gateway handles session-aware routing
- Best for scaling multiple MCP servers
This is where your core question lives: Can you bundle a self-hosted LLM with MCP servers and serve the combined capability via an API?
Yes, absolutely. Here's the landscape:
- Open-source LLM runner, extremely popular for local/self-hosted use
- Runs models like Llama 3.2, Mistral, Qwen, Phi, DeepSeek, etc.
- Exposes an OpenAI-compatible REST API at
http://localhost:11434 - Supports tool calling (function calling) — essential for MCP integration
- Docker-friendly:
docker run ollama/ollama
| Project | What It Does |
|---|---|
ollama-mcp-bridge (jonigl/ollama-mcp-bridge) |
Drop-in proxy for the Ollama API that transparently adds all MCP server tools to every /api/chat request. Your apps talk to Ollama's API as normal, but MCP tools are automatically available. |
MCP-Bridge (SecretiveShell/MCP-Bridge) |
Exposes MCP tools via an OpenAI-compatible API. Connect your MCP servers + any inference server (Ollama, vLLM, etc.) and get a unified API. Supports API key auth. |
mcpo (open-webui/mcpo) |
Converts any MCP server into a standard RESTful OpenAPI endpoint. Dead simple. uvx mcpo --port 8000 -- python your_mcp_server.py |
| Open WebUI | Full chat UI (like ChatGPT) that connects to Ollama. Supports MCP servers via mcpo bridge. |
A popular pattern is running all three together:
# Simplified example — see full configs in project repos
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
ports: ["8080:8080"]
environment:
- OLLAMA_BASE_URL=http://ollama:11434
ollama:
image: ollama/ollama:latest
ports: ["11434:11434"]
your-mcp-server:
build: ./your-mcp-server
command: ["mcpo", "--port", "8000", "--", "python", "server.py"]
ports: ["8000:8000"]| Platform | Notes |
|---|---|
| Ollama | Easiest to set up. Great model library. Best for personal/learning use. |
| vLLM | High-performance serving. Better throughput for production. OpenAI-compatible API. |
| llama.cpp / llama-server | Lightweight C++ inference. Good for resource-constrained environments. |
| LocalAI | OpenAI-compatible API, supports multiple model formats, has built-in tool calling. |
| LM Studio | Desktop app with server mode. Easy model management. OpenAI-compatible API. |
| Onyx (formerly Danswer) | Enterprise search + RAG platform. Can integrate with MCP for tool access. |
| Jan | Desktop app, self-hosted, OpenAI-compatible API. |
"It would be great if I could encapsulate the functionality of MCP and LLM within a service of sorts and serve it via API so that I don't have to include an LLM in all of my tiny applications."
This is not only feasible — it's becoming a recognized architecture pattern. Here's how it works:
┌─────────────────────────────────────────────┐
│ Your Tiny Applications │
│ (Web apps, CLI tools, mobile apps, etc.) │
│ │
│ Simple REST calls: │
│ POST /api/ask { "question": "..." } │
│ POST /api/do { "action": "...", ... } │
└──────────────────┬──────────────────────────┘
│ HTTP/REST
▼
┌─────────────────────────────────────────────┐
│ Your AI Service Gateway │
│ (FastAPI / ASP.NET / Express / etc.) │
│ │
│ - Receives plain REST requests │
│ - Orchestrates LLM + MCP tools │
│ - Returns structured responses │
│ - Handles auth, rate limiting, logging │
└──────┬────────────────────┬─────────────────┘
│ │
▼ ▼
┌──────────────┐ ┌──────────────────────┐
│ Self-Hosted │ │ Your MCP Servers │
│ LLM │ │ │
│ (Ollama) │ │ - Weather tool │
│ │ │ - DB query tool │
│ llama3.2 │ │ - File search tool │
│ mistral │ │ - Custom tools... │
│ etc. │ │ │
└──────────────┘ └──────────────────────┘
- Separation of concerns: Your tiny apps stay tiny. They make simple HTTP calls.
- Single LLM instance: One Ollama instance serves all your apps. No redundancy.
- MCP tools are shared: All your MCP servers are available to the gateway, which decides when to use them.
- Model flexibility: Swap LLMs without touching your apps (switch from Llama to Mistral, etc.).
- Cost efficiency: One machine running the LLM, many apps consuming it.
- MCP-Bridge and ollama-mcp-bridge are exactly this pattern packaged as open-source tools
- The MCP gateway pattern is well-documented and gaining traction (Portkey, Lasso, Kong, Gravitee all offer MCP gateway products)
- Docker Compose makes it trivial to run the full stack locally
- For learning and personal projects, this is arguably the ideal architecture
- Tool calling quality varies by model. Smaller open-source models (7B-13B) are less reliable at deciding when and how to call tools compared to GPT-4 or Claude.
- Latency: Self-hosted LLMs on consumer hardware are slower than cloud APIs (though fine for personal use).
- Maintenance burden: You're running and updating the LLM, the gateway, and the MCP servers.
- Context window limits: Open-source models often have smaller context windows.
- For production/public apps, most teams still use cloud LLM APIs. For personal/learning use, self-hosted is excellent.
The simplest path to your goal:
- Run Ollama with a tool-calling-capable model (Llama 3.2, Mistral, Qwen 2.5)
- Use MCP-Bridge or ollama-mcp-bridge as your gateway — it already does the hard work of:
- Connecting to your MCP servers
- Translating MCP tools into OpenAI-compatible function schemas
- Running the tool-calling loop (LLM decides to call tool → gateway calls MCP server → result goes back to LLM)
- Exposing an OpenAI-compatible API your apps can call
- Your tiny apps just call the gateway's
/v1/chat/completionsendpoint — standard OpenAI API format
Or build your own thin gateway:
# Conceptual example (Python + FastAPI + Ollama + FastMCP)
from fastapi import FastAPI
from fastmcp import Client as MCPClient
import ollama
app = FastAPI()
@app.post("/api/ask")
async def ask(question: str):
# 1. Get available tools from MCP servers
async with MCPClient("http://localhost:8080/mcp") as mcp:
tools = await mcp.list_tools()
# 2. Ask LLM with tools available
response = ollama.chat(
model="llama3.2",
messages=[{"role": "user", "content": question}],
tools=convert_mcp_to_ollama_tools(tools)
)
# 3. If LLM wants to call a tool, execute it
if response.message.tool_calls:
for call in response.message.tool_calls:
async with MCPClient("http://localhost:8080/mcp") as mcp:
result = await mcp.call_tool(call.function.name, call.function.arguments)
# Feed result back to LLM for final answer
# ... (tool calling loop)
return {"answer": response.message.content}| Language | Repo |
|---|---|
| TypeScript | github.com/modelcontextprotocol/typescript-sdk |
| Python | github.com/modelcontextprotocol/python-sdk |
| .NET (C#) | github.com/modelcontextprotocol/csharp-sdk |
| Java/Kotlin | github.com/modelcontextprotocol/java-sdk |
| Tool | Purpose |
|---|---|
| mcpo | MCP → OpenAPI REST proxy (by Open WebUI team) |
| MCP-Bridge | MCP tools → OpenAI-compatible API |
| ollama-mcp-bridge | Drop-in Ollama proxy with MCP tools |
| openapi-to-mcp | Convert any OpenAPI/Swagger spec into an MCP server |
| Lasso MCP Gateway | Open-source proxy/orchestration layer for multiple MCP servers |
| MetaMCP | Manage and aggregate multiple MCP servers |
| Platform | MCP Integration |
|---|---|
| VS Code + GitHub Copilot | Native MCP client. .vscode/mcp.json config. |
| Claude Desktop | Native MCP client (Anthropic) |
| Cursor | Native MCP support |
| Windsurf | Native MCP support |
| Open WebUI | Via mcpo bridge |
| n8n | MCP server for workflow automation |
| Docker | MCP catalog with 60+ servers |
┌─────────────┐ stdio/HTTP ┌──────────────────┐
│ VS Code │◄───────────────────►│ Your MCP Servers │
│ + Copilot │ │ (local machine) │
└─────────────┘ └──────────────────┘
┌─────────────┐ ┌──────────────────┐
│ VS Code │ │ Your MCP Servers │
│ (User A) │◄──Streamable HTTP──►│ (Cloud/VPS) │
└─────────────┘ ▲ │ │
│ │ OAuth 2.1 auth │
┌─────────────┐ │ │ Streamable HTTP │
│ VS Code │─────────┘ └──────────────────┘
│ (User B) │
└─────────────┘
┌──────────┐ ┌──────────┐ ┌──────────┐
│ App 1 │ │ App 2 │ │ App 3 │
│ (web) │ │ (CLI) │ │ (mobile) │
└────┬─────┘ └────┬─────┘ └────┬─────┘
│ │ │
└────────────┼────────────┘
│ REST API
▼
┌────────────────────────┐
│ AI Service Gateway │
│ (MCP-Bridge / │
│ ollama-mcp-bridge / │
│ custom FastAPI) │
│ │
│ OpenAI-compatible API │
└───────┬──────┬─────────┘
│ │
┌───────▼──┐ ┌─▼──────────────┐
│ Ollama │ │ MCP Servers │
│ LLM │ │ (your tools) │
└──────────┘ └────────────────┘
Yes. Multiple well-supported paths exist:
- Source code sharing (simplest)
- Docker images (portable)
- Remote hosting with Streamable HTTP (most powerful)
- MCP registries via Azure API Center (enterprise)
Yes, very feasible. This is an active and growing pattern:
- ollama-mcp-bridge and MCP-Bridge are production-quality implementations of exactly this
- Docker Compose makes the entire stack deployable in one command
- Tool calling works well with Llama 3.2 (8B+), Mistral, Qwen 2.5
Growing but niche. Here's the honest picture:
- Very popular in the hobbyist/tinkerer/learning community
- Common in privacy-sensitive enterprise environments
- Less common for public-facing products (cloud LLM APIs still dominate there)
- The MCP gateway pattern (centralized AI service) is rapidly gaining adoption in 2025-2026
- For your stated use case (learning, small apps, personal enjoyment) — this is ideal
| Model Size | RAM Needed | GPU (Optional) | Response Speed |
|---|---|---|---|
| 1-3B (Phi, Qwen-mini) | 4-8 GB | Not needed | Fast |
| 7-8B (Llama 3.2, Mistral) | 8-16 GB | 6+ GB VRAM helps | Good |
| 13-14B | 16-32 GB | 8+ GB VRAM | Moderate |
| 70B+ | 64+ GB | 24+ GB VRAM | Slow without GPU |
For learning and small projects, a 7-8B model on a machine with 16GB RAM works well.
FROM python:3.12-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
EXPOSE 8080
CMD ["python", "server.py"]docker build -t my-mcp-server .
docker run -p 8080:8080 my-mcp-serverOthers add to VS Code settings.json:
{
"mcp": {
"servers": {
"your-server": {
"url": "http://your-server-address:8080/mcp"
}
}
}
}# 1. Start Ollama
docker run -d -p 11434:11434 --name ollama ollama/ollama
docker exec ollama ollama pull llama3.2
# 2. Start MCP-Bridge pointing to Ollama + your MCP servers
# (see github.com/SecretiveShell/MCP-Bridge for full config)
docker run -d -p 8000:8000 \
-v ./config.json:/app/config.json \
mcpbridge/mcpbridge
# 3. Your apps call the bridge's OpenAI-compatible API
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2", "messages": [{"role": "user", "content": "What is the weather?"}]}'# See: github.com/open-webui/open-webui
# Uses mcpo to bridge MCP servers into Open WebUI
docker compose up -d # with the compose file from Section 5- MCP Specification (latest): https://modelcontextprotocol.io/specification/2025-11-25
- MCP for Beginners (Microsoft): https://github.com/microsoft/mcp-for-beginners
- MCP Servers in VS Code: https://code.visualstudio.com/mcp
- Azure Functions MCP Hosting: https://learn.microsoft.com/en-us/azure/azure-functions/self-hosted-mcp-servers
- MCP-Bridge (OpenAI-compatible MCP gateway): https://github.com/SecretiveShell/MCP-Bridge
- ollama-mcp-bridge: https://github.com/jonigl/ollama-mcp-bridge
- mcpo (MCP to OpenAPI): https://github.com/open-webui/mcpo
- Building Agentic AI with MCP + Ollama: https://dev.to/ajitkumar/building-your-first-agentic-ai-complete-guide-to-mcp-ollama-tool-calling-2o8g
- Modern AI Integrations - MCP + REST + Local LLMs (3-part series): Search "Modern AI Integrations: MCP Server Meets REST API and Local LLMs" on Medium