Skip to content

Instantly share code, notes, and snippets.

@donbr
Created May 27, 2025 06:16
Show Gist options
  • Save donbr/e54475c84d48cacdd3c90ebfffbc7e2f to your computer and use it in GitHub Desktop.
Save donbr/e54475c84d48cacdd3c90ebfffbc7e2f to your computer and use it in GitHub Desktop.
open_deep_research_analysis.md

Open Deep Research: Advanced Architecture Questions

Question 1: Intelligent Search Tool Selection

Current Implementation

The Open Deep Research application currently uses configuration-driven search API selection:

# Current approach in utils.py
async def select_and_execute_search(search_api: str, query_list: list[str], params_to_pass: dict):
    if search_api == "tavily":
        return await tavily_search.ainvoke({'queries': query_list}, **params_to_pass)
    elif search_api == "arxiv":
        search_results = await arxiv_search_async(query_list, **params_to_pass)
        # ... etc

Implementing Reasoning-Based Selection

To add intelligent search tool reasoning, implement a search router node:

graph LR
    A[Query Analysis] --> B{Search Router}
    B --> |Academic| C[ArXiv/PubMed]
    B --> |Medical| D[PubMed]
    B --> |Recent News| E[Tavily/Perplexity]
    B --> |Technical| F[Exa + Domain Filtering]
    B --> |General| G[Tavily]
Loading

Implementation Approach:

async def route_search_apis(queries: list[str], topic: str) -> dict[str, list[str]]:
    """Route queries to appropriate search APIs based on content analysis"""
    
    # Analyze query characteristics
    academic_keywords = ["research", "study", "paper", "publication"]
    medical_keywords = ["disease", "treatment", "medical", "clinical"]
    recent_keywords = ["2024", "2025", "latest", "recent", "current"]
    
    api_routing = {}
    
    for query in queries:
        query_lower = query.lower()
        
        # Rule-based routing (could be replaced with LLM classification)
        if any(keyword in query_lower for keyword in academic_keywords):
            api_routing.setdefault("arxiv", []).append(query)
        elif any(keyword in query_lower for keyword in medical_keywords):
            api_routing.setdefault("pubmed", []).append(query)
        elif any(keyword in query_lower for keyword in recent_keywords):
            api_routing.setdefault("tavily", []).append(query)
        else:
            api_routing.setdefault("tavily", []).append(query)
    
    return api_routing

Advanced LLM-Based Routing:

class SearchAPIRouter(BaseModel):
    api_selections: Dict[str, List[str]] = Field(
        description="Mapping of search API to queries that should use it"
    )
    reasoning: str = Field(description="Explanation of routing decisions")

async def llm_route_searches(queries: list[str], topic: str) -> SearchAPIRouter:
    """Use LLM to intelligently route searches based on query analysis"""
    
    router_prompt = f"""
    Analyze these search queries for topic '{topic}' and route them to appropriate search APIs:

    Available APIs:
    - arxiv: Academic papers and research
    - pubmed: Medical and life sciences literature  
    - tavily: General web search with recent content
    - perplexity: AI-powered search with analysis
    - exa: Semantic search with domain filtering

    Queries: {queries}
    
    Route each query to the most appropriate API based on:
    - Content domain (academic, medical, general)
    - Recency requirements
    - Source quality needs
    """
    
    # Implementation with structured output...

Question 2: Command goto vs Conditional Edges

Why Command goto is Used

The Open Deep Research application uses Command instead of conditional edges for sophisticated routing control:

graph TB
    subgraph "Command Approach"
        A[human_feedback] --> B{Feedback Type}
        B --> |Boolean True| C[Parallel Dispatch<br/>Send to Multiple Sections]
        B --> |String Feedback| D[Update State +<br/>Route to Regenerate]
    end
    
    subgraph "Conditional Edge Limitation"
        E[Node] --> F{Simple Function}
        F --> |static return| G[Single Next Node]
    end
Loading

Key Architectural Advantages

1. Dynamic Parallel Dispatch:

# Command enables this complex routing:
return Command(goto=[
    Send("build_section_with_web_research", {
        "topic": topic, 
        "section": s, 
        "search_iterations": 0
    }) 
    for s in sections if s.research  # Dynamic list creation
])

2. State Updates During Routing:

# Command allows state mutations with routing decisions:
return Command(
    goto="generate_report_plan", 
    update={"feedback_on_report_plan": [feedback]}  # State update
)

3. Complex Decision Logic:

  • Commands can contain arbitrary Python logic
  • Conditional edges are limited to simple function returns
  • Commands support both single destinations and parallel dispatch

Design Pattern Comparison

Feature Conditional Edge Command
Routing Logic Simple function return Full Python logic
State Updates No Yes, with update parameter
Parallel Dispatch No Yes, via Send() list
Dynamic Destinations No Yes, computed at runtime
Error Handling Limited Full exception handling

Question 3: Conflicting Research Content

Current Limitations

The Open Deep Research application has limited conflict resolution:

  1. No Explicit Conflict Detection: The system doesn't identify contradictory information
  2. LLM-Dependent Synthesis: Relies on the LLM to handle conflicts during writing
  3. No Consistency Validation: Quality grading doesn't check for internal contradictions

Current Flow:

graph LR
    A[Multiple Sources] --> B[Deduplication by URL]
    B --> C[Format & Present to LLM]
    C --> D[LLM Synthesis]
    D --> E[Quality Grade Pass/Fail]
    E --> |Pass| F[Accept]
    E --> |Fail| G[More Research]
Loading

Enhanced Conflict Resolution Architecture

Add Conflict Detection Node:

graph LR
    A[Research Results] --> B[Conflict Detector]
    B --> |No Conflicts| C[Standard Synthesis]
    B --> |Conflicts Found| D[Conflict Resolver]
    D --> E[Reconciled Content]
    E --> F[Enhanced Section Writer]
Loading

Implementation Approach:

class ConflictAnalysis(BaseModel):
    has_conflicts: bool = Field(description="Whether conflicting information was found")
    conflict_details: List[str] = Field(description="Specific conflicts identified")
    resolution_strategy: str = Field(description="How to handle the conflicts")

async def detect_research_conflicts(source_str: str, topic: str) -> ConflictAnalysis:
    """Analyze research sources for conflicting information"""
    
    conflict_prompt = f"""
    Analyze the following research sources for topic '{topic}' and identify any conflicting information:
    
    {source_str}
    
    Look for:
    - Contradictory facts or statistics
    - Opposing viewpoints on the same aspect
    - Different conclusions from similar studies
    - Inconsistent timelines or dates
    """
    
    # LLM analysis for conflict detection...

class ConflictResolution(BaseModel):
    resolution_type: Literal["present_both", "weight_by_credibility", "seek_additional_sources"]
    resolved_content: str = Field(description="Content with conflicts addressed")
    confidence_level: float = Field(description="Confidence in the resolution")

async def resolve_conflicts(conflicts: ConflictAnalysis, source_str: str) -> ConflictResolution:
    """Resolve identified conflicts in research content"""
    
    if conflicts.resolution_strategy == "present_both_perspectives":
        # Create balanced presentation of conflicting viewpoints
        pass
    elif conflicts.resolution_strategy == "weight_by_source_credibility":
        # Prioritize more authoritative sources
        pass
    elif conflicts.resolution_strategy == "flag_for_additional_research":
        # Request more targeted research to resolve conflicts
        pass

Integration Points

Enhanced Section Writer with Conflict Awareness:

async def write_section_with_conflict_resolution(state: SectionState, config: RunnableConfig):
    # 1. Detect conflicts in source material
    conflicts = await detect_research_conflicts(state["source_str"], state["topic"])
    
    # 2. Resolve conflicts if found
    if conflicts.has_conflicts:
        resolution = await resolve_conflicts(conflicts, state["source_str"])
        enhanced_source_str = resolution.resolved_content
    else:
        enhanced_source_str = state["source_str"]
    
    # 3. Write section with conflict-aware content
    # ... existing section writing logic

Key Takeaways

  1. Search Intelligence: Move from configuration-driven to reasoning-based API selection using query analysis or LLM routing
  2. Command Architecture: Commands provide necessary flexibility for complex routing, state updates, and parallel dispatch that conditional edges cannot support
  3. Conflict Resolution: Current system lacks explicit conflict handling - enhancement opportunity through dedicated conflict detection and resolution nodes

These architectural decisions reflect the complexity of orchestrating multi-source research with quality control and human oversight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment