Readme

Claude Sonnet 4

Claude Sonnet 4 is a hybrid reasoning model that offers both near-instant responses and extended thinking capabilities. It significantly improves upon Claude Sonnet 3.7’s performance while maintaining efficiency for everyday use cases.

Key Capabilities

Dual Operating Modes

Standard mode: Fast responses for typical tasks
Extended thinking: Deep reasoning for complex problems (up to 64K tokens)

Core Features

Advanced coding capabilities with 72.7% performance on SWE-bench
Enhanced instruction following and steerability
Parallel tool execution
Memory improvements when given access to local files
Web search integration during extended thinking (beta)
65% reduction in shortcut/loophole behavior compared to Sonnet 3.7

Performance Benchmarks

Coding

SWE-bench Verified: 72.7%
Described as “state-of-the-art” for coding tasks

Reasoning (with extended thinking)

GPQA Diamond: 75.5% (70.0% without extended thinking)
MMMLU: 88.2% (85.4% without extended thinking)
MMMU: 77.6% (72.6% without extended thinking)
AIME: 40.0% (33.1% without extended thinking)

Pricing

Input: $3 per million tokens
Output: $15 per million tokens

Safety and Reliability

Implements AI Safety Level 3 (ASL-3) protections
Extensive testing and evaluation
Reduced tendency to use shortcuts or exploit loopholes
Thinking summaries available (condensed from full reasoning when needed)

Use Cases

Sonnet 4 is optimized for:

Daily coding tasks and development workflows
Complex instruction following
Multi-file codebase operations
Autonomous application development
Long-form reasoning and analysis
Agent-based workflows

Limitations

Does not match Claude Opus 4 performance in most domains
Extended thinking features require paid plans
Memory capabilities depend on developer-provided file access