Open Deep Research: Advanced Architecture Questions

Question 1: Intelligent Search Tool Selection

Current Implementation

The Open Deep Research application currently uses configuration-driven search API selection:

# Current approach in utils.py
async def select_and_execute_search(search_api: str, query_list: list[str], params_to_pass: dict):
    if search_api == "tavily":
        return await tavily_search.ainvoke({'queries': query_list}, **params_to_pass)
    elif search_api == "arxiv":
        search_results = await arxiv_search_async(query_list, **params_to_pass)
        # ... etc

Implementing Reasoning-Based Selection

To add intelligent search tool reasoning, implement a search router node:

graph LR
    A[Query Analysis] --> B{Search Router}
    B --> |Academic| C[ArXiv/PubMed]
    B --> |Medical| D[PubMed]
    B --> |Recent News| E[Tavily/Perplexity]
    B --> |Technical| F[Exa + Domain Filtering]
    B --> |General| G[Tavily]

Implementation Approach:

async def route_search_apis(queries: list[str], topic: str) -> dict[str, list[str]]:
    """Route queries to appropriate search APIs based on content analysis"""
    
    # Analyze query characteristics
    academic_keywords = ["research", "study", "paper", "publication"]
    medical_keywords = ["disease", "treatment", "medical", "clinical"]
    recent_keywords = ["2024", "2025", "latest", "recent", "current"]
    
    api_routing = {}
    
    for query in queries:
        query_lower = query.lower()
        
        # Rule-based routing (could be replaced with LLM classification)
        if any(keyword in query_lower for keyword in academic_keywords):
            api_routing.setdefault("arxiv", []).append(query)
        elif any(keyword in query_lower for keyword in medical_keywords):
            api_routing.setdefault("pubmed", []).append(query)
        elif any(keyword in query_lower for keyword in recent_keywords):
            api_routing.setdefault("tavily", []).append(query)
        else:
            api_routing.setdefault("tavily", []).append(query)
    
    return api_routing

Advanced LLM-Based Routing:

class SearchAPIRouter(BaseModel):
    api_selections: Dict[str, List[str]] = Field(
        description="Mapping of search API to queries that should use it"
    )
    reasoning: str = Field(description="Explanation of routing decisions")

async def llm_route_searches(queries: list[str], topic: str) -> SearchAPIRouter:
    """Use LLM to intelligently route searches based on query analysis"""
    
    router_prompt = f"""
    Analyze these search queries for topic '{topic}' and route them to appropriate search APIs:

    Available APIs:
    - arxiv: Academic papers and research
    - pubmed: Medical and life sciences literature  
    - tavily: General web search with recent content
    - perplexity: AI-powered search with analysis
    - exa: Semantic search with domain filtering

    Queries: {queries}
    
    Route each query to the most appropriate API based on:
    - Content domain (academic, medical, general)
    - Recency requirements
    - Source quality needs
    """
    
    # Implementation with structured output...

Question 2: Command goto vs Conditional Edges

Why Command goto is Used

The Open Deep Research application uses Command instead of conditional edges for sophisticated routing control:

graph TB
    subgraph "Command Approach"
        A[human_feedback] --> B{Feedback Type}
        B --> |Boolean True| C[Parallel Dispatch<br/>Send to Multiple Sections]
        B --> |String Feedback| D[Update State +<br/>Route to Regenerate]
    end
    
    subgraph "Conditional Edge Limitation"
        E[Node] --> F{Simple Function}
        F --> |static return| G[Single Next Node]
    end

Key Architectural Advantages

1. Dynamic Parallel Dispatch:

# Command enables this complex routing:
return Command(goto=[
    Send("build_section_with_web_research", {
        "topic": topic, 
        "section": s, 
        "search_iterations": 0
    }) 
    for s in sections if s.research  # Dynamic list creation
])

2. State Updates During Routing:

# Command allows state mutations with routing decisions:
return Command(
    goto="generate_report_plan", 
    update={"feedback_on_report_plan": [feedback]}  # State update
)

3. Complex Decision Logic:

Commands can contain arbitrary Python logic
Conditional edges are limited to simple function returns
Commands support both single destinations and parallel dispatch

Design Pattern Comparison

Feature	Conditional Edge	Command
Routing Logic	Simple function return	Full Python logic
State Updates	No	Yes, with `update` parameter
Parallel Dispatch	No	Yes, via Send() list
Dynamic Destinations	No	Yes, computed at runtime
Error Handling	Limited	Full exception handling

Question 3: Conflicting Research Content

Current Limitations

The Open Deep Research application has limited conflict resolution:

No Explicit Conflict Detection: The system doesn't identify contradictory information
LLM-Dependent Synthesis: Relies on the LLM to handle conflicts during writing
No Consistency Validation: Quality grading doesn't check for internal contradictions

Current Flow:

graph LR
    A[Multiple Sources] --> B[Deduplication by URL]
    B --> C[Format & Present to LLM]
    C --> D[LLM Synthesis]
    D --> E[Quality Grade Pass/Fail]
    E --> |Pass| F[Accept]
    E --> |Fail| G[More Research]

Enhanced Conflict Resolution Architecture

Add Conflict Detection Node:

graph LR
    A[Research Results] --> B[Conflict Detector]
    B --> |No Conflicts| C[Standard Synthesis]
    B --> |Conflicts Found| D[Conflict Resolver]
    D --> E[Reconciled Content]
    E --> F[Enhanced Section Writer]