This guide explains how to replicate the A2A Deep Research system on another machine using Ollama for local LLM inference.
The A2A Deep Research system is a multi-agent research framework that:
- Breaks down research questions into tasks (Planning Agent)
- Researches each task independently (Research Agent)
- Synthesizes findings into a comprehensive report (Synthesis Agent)
- Coordinates the workflow (Orchestrator)
Architecture:
Client Request → Orchestrator (8100)
↓
Planning Agent (8101) → Task breakdown
↓
Research Agent (8102) → Task research (x N tasks)
↓
Synthesis Agent (8103) → Final report
↓
Output: Markdown + PDF + Trace
- OS: Linux (Ubuntu 20.04+ recommended)
- Python: 3.10+
- RAM: 16GB minimum (32GB+ recommended for larger models)
- GPU: Optional but recommended for faster inference
- Disk: 20GB+ for models
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Start Ollama service
ollama serve
# Verify installation
ollama --versionChoose models based on your hardware. Recommended options:
For 16GB RAM (CPU or modest GPU):
ollama pull qwen2.5:14b
ollama pull deepseek-r1:14bFor 32GB+ RAM or better GPU:
ollama pull qwen2.5:32b
ollama pull deepseek-r1:32b
ollama pull qwen2.5-coder:32bLightweight option (8GB RAM):
ollama pull qwen2.5:7b
ollama pull deepseek-r1:7bVerify models are available:
ollama list# Create project directory
mkdir -p ~/a2a-research
cd ~/a2a-research
# Copy the following structure:
# src/a2a_research/
# ├── __init__.py
# ├── config.py
# ├── base_agent.py
# ├── orchestrator.py
# └── agents/
# ├── __init__.py
# ├── planning.py
# ├── research.py
# └── synthesis.py
# config/
# ├── models.yaml
# └── prompts.yaml
# scripts/
# ├── start-services.sh
# └── stop-services.sh
# requirements.txtcd ~/a2a-research
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt
# Install PDF generation support (optional but recommended)
pip install weasyprint
# System dependency for weasyprint (Ubuntu/Debian)
sudo apt install -y libpango-1.0-0 libpangocairo-1.0-0 pandocrequirements.txt:
fastapi>=0.104.0
uvicorn>=0.24.0
httpx>=0.25.0
pydantic>=2.0.0
python-json-logger>=2.0.0
pyyaml>=6.0
weasyprint>=60.0
Edit src/a2a_research/config.py to point to Ollama:
# Change these lines (around line 29-31):
# Old (LiteLLM):
# LITELLM_URL = "http://localhost:14000"
# LITELLM_API_KEY = "sk-xxxxx"
# New (Ollama):
LITELLM_URL = "http://localhost:11434" # Ollama default port
LITELLM_API_KEY = "ollama" # Ollama doesn't require a real keyEdit src/a2a_research/base_agent.py to use Ollama's API format.
Find the call_llm method and update the endpoint:
async def call_llm(self, prompt: str, system_prompt: Optional[str] = None) -> str:
"""Call the LLM via Ollama API."""
messages = []
if system_prompt:
messages.append({"role": "system", "content": system_prompt})
messages.append({"role": "user", "content": prompt})
# Ollama endpoint
url = f"{LITELLM_URL}/v1/chat/completions"
payload = {
"model": self.config.model,
"messages": messages,
"max_tokens": 4096,
"stream": False
}
async with httpx.AsyncClient(timeout=600.0) as client: # Longer timeout for local models
response = await client.post(
url,
json=payload,
headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
)
response.raise_for_status()
result = response.json()
return result["choices"][0]["message"]["content"]Edit config/models.yaml to use your Ollama models:
# Model assignments for Ollama
# Use exact model names from 'ollama list'
orchestrator: qwen2.5:32b # or qwen2.5:14b for less RAM
planning: deepseek-r1:32b # Good at reasoning/planning
research: qwen2.5:32b # Good at information retrieval
synthesis: deepseek-r1:32b # Good at writing/synthesisAlternative lightweight config:
orchestrator: qwen2.5:7b
planning: deepseek-r1:7b
research: qwen2.5:7b
synthesis: deepseek-r1:7bEdit config/prompts.yaml to customize agent behavior:
orchestrator: |
You are a research orchestrator. Coordinate the research workflow.
Break complex topics into manageable research tasks.
planning: |
You are a research planning specialist.
Given a research goal, create 5-7 specific research tasks.
Format each task clearly with a title and description.
Research Goal: {{research_goal}}
research: |
You are a research specialist.
Investigate the following task thoroughly.
Provide detailed findings with sources where possible.
Task: {{task_description}}
synthesis: |
You are a research synthesis specialist.
Combine all research findings into a comprehensive, well-structured report.
Include an executive summary, main findings, and conclusions.
Cite sources where available.#!/bin/bash
set -e
PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"
# Activate venv
source venv/bin/activate
export PYTHONPATH="$PROJECT_DIR/src:$PYTHONPATH"
# Create directories
mkdir -p logs .pids
echo "Starting A2A Research Agents..."
# Start agents
python -m a2a_research.agents.planning > logs/planning.log 2>&1 &
echo $! > .pids/planning.pid
echo " Planning Agent started (PID: $!)"
python -m a2a_research.agents.research > logs/research.log 2>&1 &
echo $! > .pids/research.pid
echo " Research Agent started (PID: $!)"
python -m a2a_research.agents.synthesis > logs/synthesis.log 2>&1 &
echo $! > .pids/synthesis.pid
echo " Synthesis Agent started (PID: $!)"
sleep 2 # Let agents initialize
python -m a2a_research.orchestrator > logs/orchestrator.log 2>&1 &
echo $! > .pids/orchestrator.pid
echo " Orchestrator started (PID: $!)"
echo ""
echo "All services started!"
echo " Orchestrator: http://localhost:8100"
echo " Planning: http://localhost:8101"
echo " Research: http://localhost:8102"
echo " Synthesis: http://localhost:8103"#!/bin/bash
PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"
echo "Stopping A2A Research Agents..."
for pidfile in .pids/*.pid; do
if [ -f "$pidfile" ]; then
pid=$(cat "$pidfile")
if kill -0 "$pid" 2>/dev/null; then
kill "$pid"
echo " Stopped $(basename "$pidfile" .pid) (PID: $pid)"
fi
rm "$pidfile"
fi
done
echo "All services stopped."Make scripts executable:
chmod +x scripts/start-services.sh scripts/stop-services.sh# 1. Ensure Ollama is running
ollama serve # In a separate terminal, or run as service
# 2. Start the research agents
./scripts/start-services.sh
# 3. Verify services are running
curl http://localhost:8100/health
curl http://localhost:8101/health
curl http://localhost:8102/health
curl http://localhost:8103/healthcurl -s -X POST http://localhost:8100/rpc \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "message/send",
"id": "1",
"params": {
"message": {
"messageId": "msg-1",
"role": "user",
"parts": [{"kind": "text", "text": "What are the key differences between REST and GraphQL APIs?"}]
}
}
}' | jq -r '.result.parts[0].text'import httpx
import json
def research(topic: str) -> str:
response = httpx.post(
"http://localhost:8100/rpc",
json={
"jsonrpc": "2.0",
"method": "message/send",
"id": "1",
"params": {
"message": {
"messageId": "msg-1",
"role": "user",
"parts": [{"kind": "text", "text": topic}]
}
}
},
timeout=600.0 # Long timeout for research
)
result = response.json()
return result["result"]["parts"][0]["text"]
# Run research
output = research("Explain quantum computing basics")
print(output)Reports are saved to the reports/ directory:
reports/
├── 20260123_1430_your-research-topic.md # Markdown report
├── 20260123_1430_your-research-topic.pdf # PDF report
└── 20260123_1430_your-research-topic.trace.md # Workflow trace
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Check Ollama logs
journalctl -u ollama -f# Check agent logs
tail -f logs/orchestrator.log
tail -f logs/planning.log
tail -f logs/research.log
tail -f logs/synthesis.log- Use smaller models (7b instead of 32b)
- Reduce
max_tokensin config - Ensure no other heavy processes are running
- Local models are slower than cloud APIs
- Consider GPU acceleration
- Use quantized models (e.g.,
qwen2.5:14b-q4_0)
To use as a Claude Code skill, add to your .claude/settings.json:
{
"skills": {
"a2a-researcher": {
"type": "subagent",
"description": "Deep research using local Ollama models",
"command": "curl -s -X POST http://localhost:8100/rpc -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"message/send\",\"id\":\"1\",\"params\":{\"message\":{\"messageId\":\"msg-1\",\"role\":\"user\",\"parts\":[{\"kind\":\"text\",\"text\":\"$PROMPT\"}]}}}'"
}
}
}a2a-research/
├── src/
│ └── a2a_research/
│ ├── __init__.py
│ ├── config.py # LLM connection settings
│ ├── base_agent.py # Base agent class with LLM calls
│ ├── orchestrator.py # Workflow coordinator
│ └── agents/
│ ├── __init__.py
│ ├── planning.py # Task breakdown
│ ├── research.py # Information gathering
│ └── synthesis.py # Report generation
├── config/
│ ├── models.yaml # Agent-to-model mapping
│ └── prompts.yaml # System prompts
├── scripts/
│ ├── start-services.sh
│ └── stop-services.sh
├── reports/ # Generated reports
├── logs/ # Runtime logs
├── .pids/ # Process IDs
├── venv/ # Python virtual environment
└── requirements.txt
| Component | Port | Purpose |
|---|---|---|
| Ollama | 11434 | LLM inference |
| Orchestrator | 8100 | Workflow coordination |
| Planning Agent | 8101 | Task decomposition |
| Research Agent | 8102 | Information gathering |
| Synthesis Agent | 8103 | Report writing |
| Command | Description |
|---|---|
ollama serve |
Start Ollama |
ollama list |
List available models |
./scripts/start-services.sh |
Start all agents |
./scripts/stop-services.sh |
Stop all agents |
curl localhost:8100/health |
Check orchestrator health |