Official

openai / gpt-4.1

OpenAI's Flagship GPT model for complex tasks.

  • Public
  • 2.5K runs
  • License
Iterate in playground

GPT-4.1 is a high-performance language model optimized for real-world applications, delivering major improvements in coding, instruction following, and long-context comprehension. It supports up to 1 million tokens of context, features a June 2024 knowledge cutoff, and is designed to be more reliable and cost-effective across a wide range of use cases — from building intelligent agents to processing large codebases and documents. GPT‑4.1 offers improved reasoning, faster output, and significantly enhanced formatting fidelity.


Key Capabilities

  • 1M token context window for large document/code handling
  • Improved instruction following, including format adherence, content control, and negative/ordered instructions
  • Top-tier performance in coding tasks and diffs
  • Optimized for agentic workflows, long-context reasoning, and tool use
  • Real-world tested across legal, financial, engineering, and developer tools

Benchmark Highlights

SWE-bench Verified (Coding): 54.6% 

MultiChallenge (Instruction): 38.3% 

IFEval (Format compliance): 87.4% 

Video-MME (Long video QA): 72.0% 

Aider Diff Format Accuracy: 53% 

Graphwalks (Multi-hop reasoning): 62% 

Use Cases

  • Building agentic systems with strong multi-turn coherence
  • Editing and understanding large codebases or diff formats
  • Complex data extraction from lengthy documents
  • Highly structured content generation
  • Multimodal reasoning tasks (e.g., charts, diagrams, videos)

🔧 Developer Notes

  • Available via OpenAI API only
  • Supports up to 32,768 output tokens
  • Compatible with prompt caching and Batch API
  • Designed for production-scale performance and reliability

🧪 Real-World Results

  • Windsurf: 60% higher accuracy on internal code benchmarks; smoother tool usage
  • Qodo: Better suggestions in 55% of pull request reviews, with higher precision and focus
  • Blue J: 53% more accurate on complex tax scenarios
  • Thomson Reuters: 17% improvement in long-document legal review
  • Carlyle: 50% better retrieval accuracy across large financial files