普通模式下,Claude 直接生成答案。Extended Thinking 开启后,Claude 会先进行内部推理——探索多个思路、自我验证、逐步推导——再给出最终答案。对于复杂问题,质量提升显著。
什么时候用 Extended Thinking
适合:
- 需要多步推理的数学/算法问题
- 复杂系统架构设计(需要权衡多个方案)
- 安全审计(需要穷举攻击面)
- 代码 Bug 根因分析(需要追踪执行链路)
- 需要高置信度答案的关键决策
不适合:
- 简单问答、格式转换
- 对延迟敏感的实时应用
- 预算有限的高频调用
支持的模型
- claude-opus-4-5(效果最好)
- claude-sonnet-4-5(性价比高)
基础用法
python
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000 # 给思考过程分配的最大 Token 数
},
messages=[{
"role": "user",
"content": "Analyze the time complexity of this algorithm and suggest optimizations: [code]"
}]
)
# 解析响应
for block in response.content:
if block.type == "thinking":
print("Thinking process:")
print(block.thinking[:500]) # 推理过程
elif block.type == "text":
print("Answer:")
print(block.text) # 最终答案thinking_budget 如何设置
budget_tokens 控制 Claude 内部思考过程的最大 Token 数。
| 场景 | 建议 budget_tokens |
|---|---|
| 简单推理 | 1000-3000 |
| 中等复杂度 | 5000-8000 |
| 复杂架构/算法 | 10000-16000 |
| 极复杂问题 | 32000+ |
注意:max_tokens 必须大于 budget_tokens(需要为答案留空间)。
流式输出(推荐生产使用)
python
with client.messages.stream(
model="claude-sonnet-4-5",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 5000},
messages=[{"role": "user", "content": user_question}]
) as stream:
thinking_shown = False
for event in stream:
if hasattr(event, 'type'):
if event.type == "content_block_start":
if hasattr(event.content_block, "type"):
if event.content_block.type == "thinking" and not thinking_shown:
print("[Thinking...]", end="", flush=True)
thinking_shown = True
elif event.content_block.type == "text":
print("\n[Answer]")
elif event.type == "content_block_delta":
if hasattr(event.delta, 'text'):
print(event.delta.text, end='', flush=True)三个实战场景
场景 1:复杂 Bug 根因分析
python
bug_prompt = """
This distributed system occasionally produces duplicate records.
Reproduce rate: ~0.1% under high load.
Stack: Redis distributed lock + PostgreSQL + async Python workers.
Analyze the root cause systematically. Consider:
1. Race conditions in the locking mechanism
2. Network partition scenarios
3. Worker crash/restart during transaction
4. Clock skew between services
Code: [paste relevant code]
"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=12000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content": bug_prompt}]
)场景 2:架构方案选型
python
arch_prompt = """
We need to design a real-time notification system for 10M users.
Requirements: <100ms delivery, 99.9% reliability, support push/email/SMS.
Current stack: Python/FastAPI, PostgreSQL, Redis.
Evaluate these approaches:
1. WebSocket + Redis Pub/Sub
2. Server-Sent Events + message queue
3. Third-party service (Firebase/Pusher)
Give detailed tradeoff analysis and a final recommendation.
"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 12000},
messages=[{"role": "user", "content": arch_prompt}]
)场景 3:安全漏洞全面审计
python
security_prompt = """
Perform a comprehensive security audit of this authentication module.
Be exhaustive - think through every possible attack vector:
- Authentication bypass
- Session management flaws
- Injection vulnerabilities
- Cryptographic weaknesses
- Race conditions in auth flow
Code: [paste auth code]
"""
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=10000,
thinking={"type": "enabled", "budget_tokens": 8000},
messages=[{"role": "user", "content": security_prompt}]
)成本控制
thinking 内容按普通输出 Token 计费(thinking_tokens * output_price)。
| 模型 | 思考 Token 价格(/百万) |
|---|---|
| Sonnet 4.5 | $15.00 |
| Opus 4.5 | $75.00 |
策略:
- 先用较小的 budget(2000-5000)测试,质量够就不必加大
- 只对真正复杂的问题开启,简单问题关闭节省成本
- Sonnet + thinking 通常比 Opus 无 thinking 更划算
常见误区
误区 1:thinking 过程可以修改 thinking block 是只读的,不能预填充或在 few-shot 中使用。
误区 2:budget 越大越好 Claude 会自动决定用多少思考 Token,设置过大浪费成本。从 5000 开始调整。
误区 3:所有任务都应该开启 简单任务开启 thinking 反而可能降低效率(过度思考)。
来源:Extended Thinking - Anthropic 官方文档