A2A Deep Research System - Replication Guide

This guide explains how to replicate the A2A Deep Research system on another machine using Ollama for local LLM inference.

Overview

The A2A Deep Research system is a multi-agent research framework that:

Breaks down research questions into tasks (Planning Agent)
Researches each task independently (Research Agent)
Synthesizes findings into a comprehensive report (Synthesis Agent)
Coordinates the workflow (Orchestrator)

Architecture:

Client Request → Orchestrator (8100)
                      ↓
               Planning Agent (8101) → Task breakdown
                      ↓
               Research Agent (8102) → Task research (x N tasks)
                      ↓
               Synthesis Agent (8103) → Final report
                      ↓
               Output: Markdown + PDF + Trace

Prerequisites

OS: Linux (Ubuntu 20.04+ recommended)
Python: 3.10+
RAM: 16GB minimum (32GB+ recommended for larger models)
GPU: Optional but recommended for faster inference
Disk: 20GB+ for models

Step 1: Install Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
ollama serve

# Verify installation
ollama --version

Pull Required Models

Choose models based on your hardware. Recommended options:

For 16GB RAM (CPU or modest GPU):

ollama pull qwen2.5:14b
ollama pull deepseek-r1:14b

For 32GB+ RAM or better GPU:

ollama pull qwen2.5:32b
ollama pull deepseek-r1:32b
ollama pull qwen2.5-coder:32b

Lightweight option (8GB RAM):

ollama pull qwen2.5:7b
ollama pull deepseek-r1:7b

Verify models are available:

ollama list

Step 2: Clone/Copy the Project

# Create project directory
mkdir -p ~/a2a-research
cd ~/a2a-research

# Copy the following structure:
# src/a2a_research/
#   ├── __init__.py
#   ├── config.py
#   ├── base_agent.py
#   ├── orchestrator.py
#   └── agents/
#       ├── __init__.py
#       ├── planning.py
#       ├── research.py
#       └── synthesis.py
# config/
#   ├── models.yaml
#   └── prompts.yaml
# scripts/
#   ├── start-services.sh
#   └── stop-services.sh
# requirements.txt

Step 3: Setup Python Environment

cd ~/a2a-research

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Install PDF generation support (optional but recommended)
pip install weasyprint

# System dependency for weasyprint (Ubuntu/Debian)
sudo apt install -y libpango-1.0-0 libpangocairo-1.0-0 pandoc

requirements.txt:

fastapi>=0.104.0
uvicorn>=0.24.0
httpx>=0.25.0
pydantic>=2.0.0
python-json-logger>=2.0.0
pyyaml>=6.0
weasyprint>=60.0

Step 4: Configure for Ollama

4.1 Update config.py

Edit src/a2a_research/config.py to point to Ollama:

# Change these lines (around line 29-31):

# Old (LiteLLM):
# LITELLM_URL = "http://localhost:14000"
# LITELLM_API_KEY = "sk-xxxxx"

# New (Ollama):
LITELLM_URL = "http://localhost:11434"  # Ollama default port
LITELLM_API_KEY = "ollama"              # Ollama doesn't require a real key

4.2 Update base_agent.py

Edit src/a2a_research/base_agent.py to use Ollama's API format.

Find the call_llm method and update the endpoint:

async def call_llm(self, prompt: str, system_prompt: Optional[str] = None) -> str:
    """Call the LLM via Ollama API."""

    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})

    # Ollama endpoint
    url = f"{LITELLM_URL}/v1/chat/completions"

    payload = {
        "model": self.config.model,
        "messages": messages,
        "max_tokens": 4096,
        "stream": False
    }

    async with httpx.AsyncClient(timeout=600.0) as client:  # Longer timeout for local models
        response = await client.post(
            url,
            json=payload,
            headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
        )
        response.raise_for_status()
        result = response.json()
        return result["choices"][0]["message"]["content"]

4.3 Configure Model Assignments

Edit config/models.yaml to use your Ollama models:

# Model assignments for Ollama
# Use exact model names from 'ollama list'

orchestrator: qwen2.5:32b          # or qwen2.5:14b for less RAM
planning: deepseek-r1:32b          # Good at reasoning/planning
research: qwen2.5:32b              # Good at information retrieval
synthesis: deepseek-r1:32b         # Good at writing/synthesis

Alternative lightweight config:

orchestrator: qwen2.5:7b
planning: deepseek-r1:7b
research: qwen2.5:7b
synthesis: deepseek-r1:7b

Step 5: Configure Prompts (Optional)

Edit config/prompts.yaml to customize agent behavior:

orchestrator: |
  You are a research orchestrator. Coordinate the research workflow.
  Break complex topics into manageable research tasks.

planning: |
  You are a research planning specialist.
  Given a research goal, create 5-7 specific research tasks.
  Format each task clearly with a title and description.

  Research Goal: {{research_goal}}

research: |
  You are a research specialist.
  Investigate the following task thoroughly.
  Provide detailed findings with sources where possible.

  Task: {{task_description}}

synthesis: |
  You are a research synthesis specialist.
  Combine all research findings into a comprehensive, well-structured report.
  Include an executive summary, main findings, and conclusions.
  Cite sources where available.

Step 6: Create Service Scripts

start-services.sh

#!/bin/bash
set -e

PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"

# Activate venv
source venv/bin/activate
export PYTHONPATH="$PROJECT_DIR/src:$PYTHONPATH"

# Create directories
mkdir -p logs .pids

echo "Starting A2A Research Agents..."

# Start agents
python -m a2a_research.agents.planning > logs/planning.log 2>&1 &
echo $! > .pids/planning.pid
echo "  Planning Agent started (PID: $!)"

python -m a2a_research.agents.research > logs/research.log 2>&1 &
echo $! > .pids/research.pid
echo "  Research Agent started (PID: $!)"

python -m a2a_research.agents.synthesis > logs/synthesis.log 2>&1 &
echo $! > .pids/synthesis.pid
echo "  Synthesis Agent started (PID: $!)"

sleep 2  # Let agents initialize

python -m a2a_research.orchestrator > logs/orchestrator.log 2>&1 &
echo $! > .pids/orchestrator.pid
echo "  Orchestrator started (PID: $!)"

echo ""
echo "All services started!"
echo "  Orchestrator: http://localhost:8100"
echo "  Planning:     http://localhost:8101"
echo "  Research:     http://localhost:8102"
echo "  Synthesis:    http://localhost:8103"

stop-services.sh

#!/bin/bash

PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"

echo "Stopping A2A Research Agents..."

for pidfile in .pids/*.pid; do
    if [ -f "$pidfile" ]; then
        pid=$(cat "$pidfile")
        if kill -0 "$pid" 2>/dev/null; then
            kill "$pid"
            echo "  Stopped $(basename "$pidfile" .pid) (PID: $pid)"
        fi
        rm "$pidfile"
    fi
done

echo "All services stopped."

Make scripts executable:

chmod +x scripts/start-services.sh scripts/stop-services.sh

Step 7: Start the System

# 1. Ensure Ollama is running
ollama serve  # In a separate terminal, or run as service

# 2. Start the research agents
./scripts/start-services.sh

# 3. Verify services are running
curl http://localhost:8100/health
curl http://localhost:8101/health
curl http://localhost:8102/health
curl http://localhost:8103/health

Step 8: Run a Research Query

Via curl:

curl -s -X POST http://localhost:8100/rpc \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "message/send",
    "id": "1",
    "params": {
      "message": {
        "messageId": "msg-1",
        "role": "user",
        "parts": [{"kind": "text", "text": "What are the key differences between REST and GraphQL APIs?"}]
      }
    }
  }' | jq -r '.result.parts[0].text'

Via Python:

import httpx
import json

def research(topic: str) -> str:
    response = httpx.post(
        "http://localhost:8100/rpc",
        json={
            "jsonrpc": "2.0",
            "method": "message/send",
            "id": "1",
            "params": {
                "message": {
                    "messageId": "msg-1",
                    "role": "user",
                    "parts": [{"kind": "text", "text": topic}]
                }
            }
        },
        timeout=600.0  # Long timeout for research
    )
    result = response.json()
    return result["result"]["parts"][0]["text"]

# Run research
output = research("Explain quantum computing basics")
print(output)

Output Files

Reports are saved to the reports/ directory:

reports/
├── 20260123_1430_your-research-topic.md       # Markdown report
├── 20260123_1430_your-research-topic.pdf      # PDF report
└── 20260123_1430_your-research-topic.trace.md # Workflow trace

Troubleshooting

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Check Ollama logs
journalctl -u ollama -f

Agent Not Responding

# Check agent logs
tail -f logs/orchestrator.log
tail -f logs/planning.log
tail -f logs/research.log
tail -f logs/synthesis.log

Out of Memory

Use smaller models (7b instead of 32b)
Reduce max_tokens in config
Ensure no other heavy processes are running

Slow Responses

Local models are slower than cloud APIs
Consider GPU acceleration
Use quantized models (e.g., qwen2.5:14b-q4_0)

Optional: Claude Code Integration

To use as a Claude Code skill, add to your .claude/settings.json:

{
  "skills": {
    "a2a-researcher": {
      "type": "subagent",
      "description": "Deep research using local Ollama models",
      "command": "curl -s -X POST http://localhost:8100/rpc -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"message/send\",\"id\":\"1\",\"params\":{\"message\":{\"messageId\":\"msg-1\",\"role\":\"user\",\"parts\":[{\"kind\":\"text\",\"text\":\"$PROMPT\"}]}}}'"
    }
  }
}

Directory Structure Reference

a2a-research/
├── src/
│   └── a2a_research/
│       ├── __init__.py
│       ├── config.py           # LLM connection settings
│       ├── base_agent.py       # Base agent class with LLM calls
│       ├── orchestrator.py     # Workflow coordinator
│       └── agents/
│           ├── __init__.py
│           ├── planning.py     # Task breakdown
│           ├── research.py     # Information gathering
│           └── synthesis.py    # Report generation
├── config/
│   ├── models.yaml             # Agent-to-model mapping
│   └── prompts.yaml            # System prompts
├── scripts/
│   ├── start-services.sh
│   └── stop-services.sh
├── reports/                    # Generated reports
├── logs/                       # Runtime logs
├── .pids/                      # Process IDs
├── venv/                       # Python virtual environment
└── requirements.txt

Quick Reference

Component	Port	Purpose
Ollama	11434	LLM inference
Orchestrator	8100	Workflow coordination
Planning Agent	8101	Task decomposition
Research Agent	8102	Information gathering
Synthesis Agent	8103	Report writing

Command	Description
`ollama serve`	Start Ollama
`ollama list`	List available models
`./scripts/start-services.sh`	Start all agents
`./scripts/stop-services.sh`	Stop all agents
`curl localhost:8100/health`	Check orchestrator health

raoulbia-ai/a2a-deep-research--system-replication-guide.md

Select an option

No results found