Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save raoulbia-ai/b9070bdb767fc984f138142a06e140df to your computer and use it in GitHub Desktop.

Select an option

Save raoulbia-ai/b9070bdb767fc984f138142a06e140df to your computer and use it in GitHub Desktop.
This guide explains how to replicate the A2A Deep Research system on another machine using Ollama for local LLM inference.

A2A Deep Research System - Replication Guide

This guide explains how to replicate the A2A Deep Research system on another machine using Ollama for local LLM inference.


Overview

The A2A Deep Research system is a multi-agent research framework that:

  • Breaks down research questions into tasks (Planning Agent)
  • Researches each task independently (Research Agent)
  • Synthesizes findings into a comprehensive report (Synthesis Agent)
  • Coordinates the workflow (Orchestrator)

Architecture:

Client Request → Orchestrator (8100)
                      ↓
               Planning Agent (8101) → Task breakdown
                      ↓
               Research Agent (8102) → Task research (x N tasks)
                      ↓
               Synthesis Agent (8103) → Final report
                      ↓
               Output: Markdown + PDF + Trace

Prerequisites

  • OS: Linux (Ubuntu 20.04+ recommended)
  • Python: 3.10+
  • RAM: 16GB minimum (32GB+ recommended for larger models)
  • GPU: Optional but recommended for faster inference
  • Disk: 20GB+ for models

Step 1: Install Ollama

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama service
ollama serve

# Verify installation
ollama --version

Pull Required Models

Choose models based on your hardware. Recommended options:

For 16GB RAM (CPU or modest GPU):

ollama pull qwen2.5:14b
ollama pull deepseek-r1:14b

For 32GB+ RAM or better GPU:

ollama pull qwen2.5:32b
ollama pull deepseek-r1:32b
ollama pull qwen2.5-coder:32b

Lightweight option (8GB RAM):

ollama pull qwen2.5:7b
ollama pull deepseek-r1:7b

Verify models are available:

ollama list

Step 2: Clone/Copy the Project

# Create project directory
mkdir -p ~/a2a-research
cd ~/a2a-research

# Copy the following structure:
# src/a2a_research/
#   ├── __init__.py
#   ├── config.py
#   ├── base_agent.py
#   ├── orchestrator.py
#   └── agents/
#       ├── __init__.py
#       ├── planning.py
#       ├── research.py
#       └── synthesis.py
# config/
#   ├── models.yaml
#   └── prompts.yaml
# scripts/
#   ├── start-services.sh
#   └── stop-services.sh
# requirements.txt

Step 3: Setup Python Environment

cd ~/a2a-research

# Create virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install --upgrade pip
pip install -r requirements.txt

# Install PDF generation support (optional but recommended)
pip install weasyprint

# System dependency for weasyprint (Ubuntu/Debian)
sudo apt install -y libpango-1.0-0 libpangocairo-1.0-0 pandoc

requirements.txt:

fastapi>=0.104.0
uvicorn>=0.24.0
httpx>=0.25.0
pydantic>=2.0.0
python-json-logger>=2.0.0
pyyaml>=6.0
weasyprint>=60.0

Step 4: Configure for Ollama

4.1 Update config.py

Edit src/a2a_research/config.py to point to Ollama:

# Change these lines (around line 29-31):

# Old (LiteLLM):
# LITELLM_URL = "http://localhost:14000"
# LITELLM_API_KEY = "sk-xxxxx"

# New (Ollama):
LITELLM_URL = "http://localhost:11434"  # Ollama default port
LITELLM_API_KEY = "ollama"              # Ollama doesn't require a real key

4.2 Update base_agent.py

Edit src/a2a_research/base_agent.py to use Ollama's API format.

Find the call_llm method and update the endpoint:

async def call_llm(self, prompt: str, system_prompt: Optional[str] = None) -> str:
    """Call the LLM via Ollama API."""

    messages = []
    if system_prompt:
        messages.append({"role": "system", "content": system_prompt})
    messages.append({"role": "user", "content": prompt})

    # Ollama endpoint
    url = f"{LITELLM_URL}/v1/chat/completions"

    payload = {
        "model": self.config.model,
        "messages": messages,
        "max_tokens": 4096,
        "stream": False
    }

    async with httpx.AsyncClient(timeout=600.0) as client:  # Longer timeout for local models
        response = await client.post(
            url,
            json=payload,
            headers={"Authorization": f"Bearer {LITELLM_API_KEY}"}
        )
        response.raise_for_status()
        result = response.json()
        return result["choices"][0]["message"]["content"]

4.3 Configure Model Assignments

Edit config/models.yaml to use your Ollama models:

# Model assignments for Ollama
# Use exact model names from 'ollama list'

orchestrator: qwen2.5:32b          # or qwen2.5:14b for less RAM
planning: deepseek-r1:32b          # Good at reasoning/planning
research: qwen2.5:32b              # Good at information retrieval
synthesis: deepseek-r1:32b         # Good at writing/synthesis

Alternative lightweight config:

orchestrator: qwen2.5:7b
planning: deepseek-r1:7b
research: qwen2.5:7b
synthesis: deepseek-r1:7b

Step 5: Configure Prompts (Optional)

Edit config/prompts.yaml to customize agent behavior:

orchestrator: |
  You are a research orchestrator. Coordinate the research workflow.
  Break complex topics into manageable research tasks.

planning: |
  You are a research planning specialist.
  Given a research goal, create 5-7 specific research tasks.
  Format each task clearly with a title and description.

  Research Goal: {{research_goal}}

research: |
  You are a research specialist.
  Investigate the following task thoroughly.
  Provide detailed findings with sources where possible.

  Task: {{task_description}}

synthesis: |
  You are a research synthesis specialist.
  Combine all research findings into a comprehensive, well-structured report.
  Include an executive summary, main findings, and conclusions.
  Cite sources where available.

Step 6: Create Service Scripts

start-services.sh

#!/bin/bash
set -e

PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"

# Activate venv
source venv/bin/activate
export PYTHONPATH="$PROJECT_DIR/src:$PYTHONPATH"

# Create directories
mkdir -p logs .pids

echo "Starting A2A Research Agents..."

# Start agents
python -m a2a_research.agents.planning > logs/planning.log 2>&1 &
echo $! > .pids/planning.pid
echo "  Planning Agent started (PID: $!)"

python -m a2a_research.agents.research > logs/research.log 2>&1 &
echo $! > .pids/research.pid
echo "  Research Agent started (PID: $!)"

python -m a2a_research.agents.synthesis > logs/synthesis.log 2>&1 &
echo $! > .pids/synthesis.pid
echo "  Synthesis Agent started (PID: $!)"

sleep 2  # Let agents initialize

python -m a2a_research.orchestrator > logs/orchestrator.log 2>&1 &
echo $! > .pids/orchestrator.pid
echo "  Orchestrator started (PID: $!)"

echo ""
echo "All services started!"
echo "  Orchestrator: http://localhost:8100"
echo "  Planning:     http://localhost:8101"
echo "  Research:     http://localhost:8102"
echo "  Synthesis:    http://localhost:8103"

stop-services.sh

#!/bin/bash

PROJECT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$PROJECT_DIR"

echo "Stopping A2A Research Agents..."

for pidfile in .pids/*.pid; do
    if [ -f "$pidfile" ]; then
        pid=$(cat "$pidfile")
        if kill -0 "$pid" 2>/dev/null; then
            kill "$pid"
            echo "  Stopped $(basename "$pidfile" .pid) (PID: $pid)"
        fi
        rm "$pidfile"
    fi
done

echo "All services stopped."

Make scripts executable:

chmod +x scripts/start-services.sh scripts/stop-services.sh

Step 7: Start the System

# 1. Ensure Ollama is running
ollama serve  # In a separate terminal, or run as service

# 2. Start the research agents
./scripts/start-services.sh

# 3. Verify services are running
curl http://localhost:8100/health
curl http://localhost:8101/health
curl http://localhost:8102/health
curl http://localhost:8103/health

Step 8: Run a Research Query

Via curl:

curl -s -X POST http://localhost:8100/rpc \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "message/send",
    "id": "1",
    "params": {
      "message": {
        "messageId": "msg-1",
        "role": "user",
        "parts": [{"kind": "text", "text": "What are the key differences between REST and GraphQL APIs?"}]
      }
    }
  }' | jq -r '.result.parts[0].text'

Via Python:

import httpx
import json

def research(topic: str) -> str:
    response = httpx.post(
        "http://localhost:8100/rpc",
        json={
            "jsonrpc": "2.0",
            "method": "message/send",
            "id": "1",
            "params": {
                "message": {
                    "messageId": "msg-1",
                    "role": "user",
                    "parts": [{"kind": "text", "text": topic}]
                }
            }
        },
        timeout=600.0  # Long timeout for research
    )
    result = response.json()
    return result["result"]["parts"][0]["text"]

# Run research
output = research("Explain quantum computing basics")
print(output)

Output Files

Reports are saved to the reports/ directory:

reports/
├── 20260123_1430_your-research-topic.md       # Markdown report
├── 20260123_1430_your-research-topic.pdf      # PDF report
└── 20260123_1430_your-research-topic.trace.md # Workflow trace

Troubleshooting

Ollama Connection Issues

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Check Ollama logs
journalctl -u ollama -f

Agent Not Responding

# Check agent logs
tail -f logs/orchestrator.log
tail -f logs/planning.log
tail -f logs/research.log
tail -f logs/synthesis.log

Out of Memory

  • Use smaller models (7b instead of 32b)
  • Reduce max_tokens in config
  • Ensure no other heavy processes are running

Slow Responses

  • Local models are slower than cloud APIs
  • Consider GPU acceleration
  • Use quantized models (e.g., qwen2.5:14b-q4_0)

Optional: Claude Code Integration

To use as a Claude Code skill, add to your .claude/settings.json:

{
  "skills": {
    "a2a-researcher": {
      "type": "subagent",
      "description": "Deep research using local Ollama models",
      "command": "curl -s -X POST http://localhost:8100/rpc -H 'Content-Type: application/json' -d '{\"jsonrpc\":\"2.0\",\"method\":\"message/send\",\"id\":\"1\",\"params\":{\"message\":{\"messageId\":\"msg-1\",\"role\":\"user\",\"parts\":[{\"kind\":\"text\",\"text\":\"$PROMPT\"}]}}}'"
    }
  }
}

Directory Structure Reference

a2a-research/
├── src/
│   └── a2a_research/
│       ├── __init__.py
│       ├── config.py           # LLM connection settings
│       ├── base_agent.py       # Base agent class with LLM calls
│       ├── orchestrator.py     # Workflow coordinator
│       └── agents/
│           ├── __init__.py
│           ├── planning.py     # Task breakdown
│           ├── research.py     # Information gathering
│           └── synthesis.py    # Report generation
├── config/
│   ├── models.yaml             # Agent-to-model mapping
│   └── prompts.yaml            # System prompts
├── scripts/
│   ├── start-services.sh
│   └── stop-services.sh
├── reports/                    # Generated reports
├── logs/                       # Runtime logs
├── .pids/                      # Process IDs
├── venv/                       # Python virtual environment
└── requirements.txt

Quick Reference

Component Port Purpose
Ollama 11434 LLM inference
Orchestrator 8100 Workflow coordination
Planning Agent 8101 Task decomposition
Research Agent 8102 Information gathering
Synthesis Agent 8103 Report writing
Command Description
ollama serve Start Ollama
ollama list List available models
./scripts/start-services.sh Start all agents
./scripts/stop-services.sh Stop all agents
curl localhost:8100/health Check orchestrator health
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment