This document analyzes the warning message: "Anthropic structured output relies on forced tool calling, which is not supported when thinking
is enabled" and provides evidence-based solutions for developers encountering this conflict.
The warning stems from a fundamental incompatibility between two features:
- Extended Thinking Mode: Enables Claude's internal reasoning capabilities
- Forced Tool Calling: Required for LangChain's structured output implementation
According to Anthropic's official documentation, extended thinking has specific limitations with tool use:
"Tool choice limitation: Tool use with thinking only supports tool_choice: {"type": "auto"} (the default) or tool_choice: {"type": "none"}. Using tool_choice: {"type": "any"} or tool_choice: {"type": "tool", "name": "..."} will result in an error because these options force tool use, which is incompatible with extended thinking."
Source: Anthropic Extended Thinking Documentation
As documented in the Baz.co technical analysis, citing from LangChain's source code:
"Anthropic structured output relies on forced tool calling, which is not supported when
thinking
is enabled"
Source: Baz.co Structured Output Analysis
LangChain's structured output feature relies on forced tool calling to ensure schema compliance, creating this direct conflict.
This issue affects:
- LangChain users implementing structured outputs with Anthropic models
- Claude 3.7 and 4.x users where thinking mode may be enabled by default or explicitly
- Production applications requiring reliable structured responses
Multiple developers have reported this issue across various platforms. The technical details have been documented in community resources, with the Baz.co analysis providing specific insights into the LangChain implementation details.
Extended thinking is supported in:
- Claude Opus 4 (claude-opus-4-20250514)
- Claude Sonnet 4 (claude-sonnet-4-20250514)
- Claude Sonnet 3.7 (claude-3-7-sonnet-20250219)
Source: Anthropic Extended Thinking Documentation
Approach: Explicitly disable thinking to restore forced tool calling capability.
from langchain_anthropic import ChatAnthropic
from pydantic import BaseModel, Field
class ReportSchema(BaseModel):
sections: list[str] = Field(description="List of report sections")
introduction: str = Field(description="Report introduction")
# Explicitly disable thinking for structured output
llm = ChatAnthropic(
model="claude-3-7-sonnet-20250219",
thinking={'type': 'disabled'} # Key configuration
)
structured_llm = llm.with_structured_output(ReportSchema)
Pros:
- Highest reliability for structured output
- No parsing errors
- Full schema compliance
Cons:
- Loses enhanced reasoning capabilities
- May impact response quality for complex tasks
Approach: Use thinking mode with careful prompting and manual JSON extraction.
async def generate_with_thinking_and_parsing(topic: str):
llm = ChatAnthropic(
model="claude-3-7-sonnet-20250219",
thinking={'type': 'enabled', 'budget_tokens': 2000}
)
prompt = f"""Generate a research plan for: {topic}
CRITICAL: Your response must be valid JSON:
{{
"sections": ["section1", "section2"],
"introduction": "intro text"
}}"""
response = await llm.ainvoke([HumanMessage(content=prompt)])
# Extract JSON from thinking blocks
import json, re
content = extract_text_content(response.content)
json_match = re.search(r'\{.*\}', content, re.DOTALL)
if json_match:
return json.loads(json_match.group())
else:
raise ValueError("No valid JSON found")
Pros:
- Retains enhanced reasoning
- Can work with thinking mode
Cons:
- Less reliable parsing
- Requires error handling
- Potential OutputParserException
Approach: Use thinking for reasoning, then a separate model for structuring.
async def reason_and_structure(topic: str):
# Stage 1: Reasoning with thinking
reasoning_llm = ChatAnthropic(
model="claude-3-7-sonnet-20250219",
thinking={'type': 'enabled', 'budget_tokens': 3000}
)
reasoning = await reasoning_llm.ainvoke([
HumanMessage(content=f"Think deeply about research plan for: {topic}")
])
# Stage 2: Structuring with Haiku (fast + reliable)
structuring_llm = ChatAnthropic(
model="claude-3-haiku-20240307",
thinking={'type': 'disabled'}
)
structured_llm = structuring_llm.with_structured_output(ReportSchema)
return await structured_llm.ainvoke([
HumanMessage(content=f"Structure this analysis: {reasoning.content}")
])
Pros:
- Best of both worlds
- High reliability + enhanced reasoning
- Cost-effective (Haiku for structuring)
Cons:
- Increased latency
- Higher token usage
- Added complexity
Approach: Use LangChain's fallback mechanism for robust error handling.
from langchain_core.runnables.fallbacks import RunnableWithFallbacks
# Primary: Thinking-enabled model with manual parsing
primary_chain = create_thinking_chain()
# Fallback: Structured output without thinking
fallback_llm = ChatAnthropic(
model="claude-3-haiku-20240307",
thinking={'type': 'disabled'}
)
fallback_chain = fallback_llm.with_structured_output(ReportSchema)
# Combined with fallbacks
reliable_chain = primary_chain.with_fallbacks([fallback_chain])
Source: LangChain Fallbacks Documentation
For production systems prioritizing reliability:
@dataclass
class ProductionConfig:
model: str = "claude-3-7-sonnet-20250219"
thinking_enabled: bool = False # Prioritize reliability
structured_output: bool = True
fallback_model: str = "claude-3-haiku-20240307"
For environments prioritizing reasoning quality:
@dataclass
class DevelopmentConfig:
model: str = "claude-3-7-sonnet-20250219"
thinking_enabled: bool = True # Enhanced reasoning
thinking_budget: int = 3000
use_two_stage: bool = True # Reason + structure
Based on the warning message and community feedback:
from langchain_core.exceptions import OutputParserException
async def robust_structured_generation(prompt: str):
try:
# Attempt structured output
return await structured_llm.ainvoke(prompt)
except OutputParserException as e:
# Handle the specific error mentioned in warning
logger.warning(f"Structured output failed: {e}")
# Fallback to manual parsing or alternative approach
return await fallback_approach(prompt)
except Exception as e:
# Handle other potential errors
logger.error(f"Unexpected error: {e}")
raise
According to Anthropic documentation:
"You're charged for the full thinking tokens generated by the original request, not the summary tokens."
Source: Anthropic Extended Thinking Documentation
- Thinking Mode: Higher latency due to reasoning steps
- Two-Stage Approach: Additional round-trip latency
- Fallbacks: Potential retry latency
The warning "Anthropic structured output relies on forced tool calling, which is not supported when thinking
is enabled" represents a fundamental trade-off in current Anthropic models between enhanced reasoning capabilities and structured output reliability.
- For Production: Disable thinking mode when structured output is critical
- For Research: Use two-stage approach for both reasoning and reliability
- For Robustness: Implement fallback chains with multiple strategies
- For Cost Optimization: Use Haiku for structuring, Sonnet for reasoning
Anthropic may address this limitation in future API versions. Monitor their documentation for updates on thinking mode and tool calling compatibility.
- Anthropic Extended Thinking Documentation - Official documentation confirming tool choice limitations with thinking mode
- Baz.co Structured Output Analysis - Technical analysis citing LangChain source code
- LangChain Fallbacks Documentation - Official fallback implementation guidance
- Anthropic Models Overview - Model capabilities and supported features
Research conducted using MCP tools on May 25, 2025