Since every LLM agent today has url search ability and can reference a documentation site, you might be wondering…
- Why did I choose to build (and serve for free) a remote MCP server for ATprotocol documentation that is already available online?
- How can it help ATproto developers?
Allow me to explain.
For a deeper dive see Cloudflare’s definition: https://developers.cloudflare.com/ai-search/concepts/what-is-rag/
Think of Retrieval-Augmented Generation (RAG) as adding indexing to your source content.
RAG makes LLM queries more efficient the same way that adding an index or view to an unindexed SQL table makes your SQL queries more efficient.
By pre-processing the source content into an index of vector embeddings, you get more efficient retrieval of content that is specifically related to your query.
- Combines multiple reference sources into single index
- Source content supersedes outdated training data in responses
- Responses are grounded in retrieved data, thus reducing hallucinations
- De-duplicates processing the same source content in subsequent context windows
Model Context Protocol is an open source standard from Anthropic for connecting AI apps to external systems. It was released late 2024, and has rapidly gained widespread adoption. More at https://modelcontextprotocol.io/docs/getting-started/intro.
MCP Servers expose specific tools, resources, and prompts to generative ai agents through the standard of Model Context Protocol.
If you have ever asked an LLM chat agent to include results from a specific site, then you have used an URL in-prompt search approach.
When you ask for references or pass a URL directly in your prompt, the model fetches and processes each URL’s content in real-time.
Here's what typically happens to the search results text:
The key point is that any text the LLM directly processes has to be both tokenized and embedded.
By creating a single RAG index with the data already tokenized and embedded, you can query multiple documentation sources without having to re-tokenize and re-embed that data for each source URL.
MCP servers can be run locally or remotely hosted in the cloud. Most MCP servers are designed to be installed and run locally, but some "official" servers are offered remotely.
Instead of every MCP user performing the separate steps of crawling, tokenizing, and generating vector embeddings for multiple documentation content sites, and then having to keep their local index updated with changes, a remote service performs that work on a scheduled cron.
🌱Essentially “pre-processing” the RAG index, and then “caching” it for any MCP-compatible LLM in the world to consume. 🌱
Inspired by Cloudflare’s MCP Documentation Server, and wanting to play with MCP, I decided to build a (somewhat) “official” remote MCP server for the ATprotocol documentation.
I wanted to make it easier for ATprotocol developers (including myself) to query the ATproto docs.
🐙Github: https://github.com/immber/mcp-atproto-docs
📍https://mcp-atproto-docs.immber.workers.dev/mcp
My remote ATproto documentation MCP server breaks down into 3 parts…
The atproto-docs-worker is a separate cloudflare worker that recursively crawls and saves documentation to an R2 bucket. It runs on a weekly cron schedule.
To view or request changes to the list of resources currently being included, you can visit that repo. As of March 2026, this is the list of sources:
The RAG index is where the documentation files that were saved in R2 get pre-processed i.e. tokenized & vector embeddings created.
I’m using Cloudflare’s Vectorize tool to auto-generate the vector embeddings. Vectorize is connected to the R2 bucket, and is watching for changes so that when R2 files are updated, it automatically re-indexes them.
Raw HTML text files are chunked and transformed into an index of vector embeddings using Cloudflare’s instance of the open-source Qwen-3 model. (Specifically @cf/qwen/qwen3-embedding-0.6b)
Find out more about Vectorize from Cloudflare here: https://developers.cloudflare.com/vectorize/
Cloudflare has a bunch of MCP & other AI app templates on GitHub that can easily be deployed as Cloudflare workers. For reference I used the remote-mcp-authless one from https://github.com/cloudflare/ai/tree/main/demos for this project.
The MCP Server runs as a Cloudflare Worker that exposes an /SSE and an /MCP endpoint.
Once connected, a tool called “search_documentation” becomes available to your LLM. The tool accepts an input string as the query to be searched.
Then using a different model, Llama-3.3, something called AISearch (formerly Cloudflare’s AutoRAG) uses the pre-processed RAG index to generate a response to the search query.
If your LLM tool of choice supports MCP, then you can install remote servers by simply adding the URL endpoint to your MCP server configuration list.
In Claude for example, that looks like editing your claude_desktop_config.json file to add the URL
| Server GitHub/Source | Description | Local/remote | Authentication | Tools |
|---|---|---|---|---|
| immber/mcp-atproto-docs | A Remote MCP server to query the official ATproto documentation available online | Remote (Cloudflare) | None | search_documentation |
| lexicongarden/mcp | Lexicon Garden provides a Model Context Protocol (MCP) endpoint. MCP allows AI assistants to browse lexicon schemas, validate data, and interact with ATProtocol services directly. | Remote | MCP oAuth for invoke_xrpc | describe_lexicon create_record_cid invoke_xrpc facet_text validate_lexicon |
| Ashex/atproto-mcp | MCP server providing a searchable knowledge base for the AT Protocol ecosystem — protocol documentation, lexicon schemas, Bluesky developer API docs, and cookbook examples — powered by txtai semantic search. | Local (python) | None | search_atproto_docs get_lexicon list_lexicons search_lexicons get_cookbook_example list_cookbook_examples search_bsky_api refresh_sources |
| cameronrye/atproto-mcp | A MCP server that gives LLMs direct access to the AT Protocol ecosystem, enabling seamless interaction with Bluesky and other AT Protocol-based social networks. | LLM Client or Local (npm) | Supports both authenticated and unauthenticated modes - Start immediately with public data access (search posts, view profiles), or add authentication for full functionality (write operations, private data, feeds). *NOT - A direct-use REST API or SDK for application developers | 57 MCP tools across multiple categories: Public Tools: Data Retrieval OAuth Management Authentication Required Tools: Social Operations Data Retrieval Content Management List Management Moderation Real-time Streaming & Intelligence Batch Operations Analytics & Insights Content Discovery Composite Operations Rich Media Enhanced Moderation |
| brianellin/bsky-mcp-server (+16 forks) | A Model Context Protocol server that connects to Bluesky and provides tools to interact with the ATProtocol. | Local (Smithery or pnpm) | App Password stored in MCP env vars | get-pinned-feeds get-timeline-posts get-feed-posts get-list-posts get-user-posts get-profile get-follows get-followers get-liked-posts get-trends get-post-thread convert-url-to-uri search-posts search-people search-feeds like-post create-post follow-user |
| gwbischof/bluesky-social-mcp | An MCP server for interacting with the Bluesky social network via the atproto client. | Local (python) | App Password stored in MCP env vars | check_auth_status get_profile get_follows get_followers follow_user unfollow_user mute_user unmute_user resolve_handle get_timeline get_author_feed get_post_thread like_post unlike_post get_likes repost unrepost get_reposted_by send_post send_image send_images send_video delete_post get_post get_posts |
| semioz/bluesky-mcp | A Model Context Protocol (MCP) server for Bluesky that can post on your behalf by using the AT Protocol. | Local (Smithery) | App Password stored in MCP env vars | login create-post get-post get-posts delete-post like-post unlike-post repost unrepost get-profile get-timeline prompt: "format-timeline" |
| briangershon/bluesky-daily-mcp | An MCP Server to help you surface the most interesting topics from your Bluesky follows daily. | Local (npm) | App Password stored in MCP env vars | a tool to retrieve all posts from your follows for a given day sample prompts for analyzing posts |
| skywatch-bsky/claude-skills | A Claude marketplace containing skills for interacting with osprey rules | local | None | Not an MCP server, collection of Claude skills |

