Skip to content

Instantly share code, notes, and snippets.

@immber
Last active March 28, 2026 17:18
Show Gist options
  • Select an option

  • Save immber/90809135703985b05c85400af18ef772 to your computer and use it in GitHub Desktop.

Select an option

Save immber/90809135703985b05c85400af18ef772 to your computer and use it in GitHub Desktop.
Stop Hallucinating the ATprotocol

Stop Hallucinating the ATprotocol

Grounding your AI Agents with the Official @proto Docs

💙 An ATmosphere Conference 2026 Lightning Talk 💙 by Jessie Rushing

Since every LLM agent today has url search ability and can reference a documentation site, you might be wondering…

  • Why did I choose to build (and serve for free) a remote MCP server for ATprotocol documentation that is already available online?
  • How can it help ATproto developers?

Allow me to explain.

First, what is RAG?

For a deeper dive see Cloudflare’s definition: https://developers.cloudflare.com/ai-search/concepts/what-is-rag/

Think of Retrieval-Augmented Generation (RAG) as adding indexing to your source content.

RAG makes LLM queries more efficient the same way that adding an index or view to an unindexed SQL table makes your SQL queries more efficient.

By pre-processing the source content into an index of vector embeddings, you get more efficient retrieval of content that is specifically related to your query.

  • Combines multiple reference sources into single index
  • Source content supersedes outdated training data in responses
  • Responses are grounded in retrieved data, thus reducing hallucinations
  • De-duplicates processing the same source content in subsequent context windows

Quickly, what is an MCP server?

Model Context Protocol is an open source standard from Anthropic for connecting AI apps to external systems. It was released late 2024, and has rapidly gained widespread adoption. More at https://modelcontextprotocol.io/docs/getting-started/intro.

MCP Servers expose specific tools, resources, and prompts to generative ai agents through the standard of Model Context Protocol.

Let’s compare “URL Search” vs “MCP with RAG” for querying documentation

URL search (in prompt) approach

If you have ever asked an LLM chat agent to include results from a specific site, then you have used an URL in-prompt search approach.

When you ask for references or pass a URL directly in your prompt, the model fetches and processes each URL’s content in real-time.

Here's what typically happens to the search results text:

prompt-in-search

The key point is that any text the LLM directly processes has to be both tokenized and embedded.

MCP with RAG approach

By creating a single RAG index with the data already tokenized and embedded, you can query multiple documentation sources without having to re-tokenize and re-embed that data for each source URL.

mcp-rag-search

♻️ Why a remote MCP server is better for the documentation use case

💻Local vs 🌐Remote

MCP servers can be run locally or remotely hosted in the cloud. Most MCP servers are designed to be installed and run locally, but some "official" servers are offered remotely.

Instead of every MCP user performing the separate steps of crawling, tokenizing, and generating vector embeddings for multiple documentation content sites, and then having to keep their local index updated with changes, a remote service performs that work on a scheduled cron.

🌱Essentially “pre-processing” the RAG index, and then “caching” it for any MCP-compatible LLM in the world to consume. 🌱

Why I built a ATproto Docs MCP Server

Inspired by Cloudflare’s MCP Documentation Server, and wanting to play with MCP, I decided to build a (somewhat) “official” remote MCP server for the ATprotocol documentation.

I wanted to make it easier for ATprotocol developers (including myself) to query the ATproto docs.

We can just build things

🐙Github: https://github.com/immber/mcp-atproto-docs
📍https://mcp-atproto-docs.immber.workers.dev/mcp

My remote ATproto documentation MCP server breaks down into 3 parts…

mcpServerArch
  1. Cron Scraper

The atproto-docs-worker is a separate cloudflare worker that recursively crawls and saves documentation to an R2 bucket. It runs on a weekly cron schedule.

To view or request changes to the list of resources currently being included, you can visit that repo. As of March 2026, this is the list of sources:

docs-source-urls
  1. RAG Index

The RAG index is where the documentation files that were saved in R2 get pre-processed i.e. tokenized & vector embeddings created.

I’m using Cloudflare’s Vectorize tool to auto-generate the vector embeddings. Vectorize is connected to the R2 bucket, and is watching for changes so that when R2 files are updated, it automatically re-indexes them.

Raw HTML text files are chunked and transformed into an index of vector embeddings using Cloudflare’s instance of the open-source Qwen-3 model. (Specifically @cf/qwen/qwen3-embedding-0.6b)

Find out more about Vectorize from Cloudflare here: https://developers.cloudflare.com/vectorize/

  1. MCP Server

Cloudflare has a bunch of MCP & other AI app templates on GitHub that can easily be deployed as Cloudflare workers. For reference I used the remote-mcp-authless one from https://github.com/cloudflare/ai/tree/main/demos for this project.

The MCP Server runs as a Cloudflare Worker that exposes an /SSE and an /MCP endpoint.

Once connected, a tool called “search_documentation” becomes available to your LLM. The tool accepts an input string as the query to be searched.

Then using a different model, Llama-3.3, something called AISearch (formerly Cloudflare’s AutoRAG) uses the pre-processed RAG index to generate a response to the search query.

How to install in various agents

If your LLM tool of choice supports MCP, then you can install remote servers by simply adding the URL endpoint to your MCP server configuration list.

In Claude for example, that looks like editing your claude_desktop_config.json file to add the URL

mcp-config-screenshot

A short list of other MCP servers available for ATprotocol

Server GitHub/Source Description Local/remote Authentication Tools
immber/mcp-atproto-docs A Remote MCP server to query the official ATproto documentation available online Remote (Cloudflare) None search_documentation
lexicongarden/mcp Lexicon Garden provides a Model Context Protocol (MCP) endpoint. MCP allows AI assistants to browse lexicon schemas, validate data, and interact with ATProtocol services directly. Remote MCP oAuth for invoke_xrpc describe_lexicon create_record_cid invoke_xrpc facet_text validate_lexicon
Ashex/atproto-mcp MCP server providing a searchable knowledge base for the AT Protocol ecosystem — protocol documentation, lexicon schemas, Bluesky developer API docs, and cookbook examples — powered by txtai semantic search. Local (python) None search_atproto_docs get_lexicon list_lexicons search_lexicons get_cookbook_example list_cookbook_examples search_bsky_api refresh_sources
cameronrye/atproto-mcp A MCP server that gives LLMs direct access to the AT Protocol ecosystem, enabling seamless interaction with Bluesky and other AT Protocol-based social networks. LLM Client or Local (npm) Supports both authenticated and unauthenticated modes - Start immediately with public data access (search posts, view profiles), or add authentication for full functionality (write operations, private data, feeds). *NOT - A direct-use REST API or SDK for application developers 57 MCP tools across multiple categories: Public Tools: Data Retrieval OAuth Management Authentication Required Tools: Social Operations Data Retrieval Content Management List Management Moderation Real-time Streaming & Intelligence Batch Operations Analytics & Insights Content Discovery Composite Operations Rich Media Enhanced Moderation
brianellin/bsky-mcp-server (+16 forks) A Model Context Protocol server that connects to Bluesky and provides tools to interact with the ATProtocol. Local (Smithery or pnpm) App Password stored in MCP env vars get-pinned-feeds get-timeline-posts get-feed-posts get-list-posts get-user-posts get-profile get-follows get-followers get-liked-posts get-trends get-post-thread convert-url-to-uri search-posts search-people search-feeds like-post create-post follow-user
gwbischof/bluesky-social-mcp An MCP server for interacting with the Bluesky social network via the atproto client. Local (python) App Password stored in MCP env vars check_auth_status get_profile get_follows get_followers follow_user unfollow_user mute_user unmute_user resolve_handle get_timeline get_author_feed get_post_thread like_post unlike_post get_likes repost unrepost get_reposted_by send_post send_image send_images send_video delete_post get_post get_posts
semioz/bluesky-mcp A Model Context Protocol (MCP) server for Bluesky that can post on your behalf by using the AT Protocol. Local (Smithery) App Password stored in MCP env vars login create-post get-post get-posts delete-post like-post unlike-post repost unrepost get-profile get-timeline prompt: "format-timeline"
briangershon/bluesky-daily-mcp An MCP Server to help you surface the most interesting topics from your Bluesky follows daily. Local (npm) App Password stored in MCP env vars a tool to retrieve all posts from your follows for a given day sample prompts for analyzing posts
skywatch-bsky/claude-skills A Claude marketplace containing skills for interacting with osprey rules local None Not an MCP server, collection of Claude skills
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment