Official

openai / o4-mini

OpenAI's fast, lightweight reasoning model

  • Public
  • 245 runs
  • License
Iterate in playground

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many input tokens are sent and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

OpenAI o4-mini is a fast, cost-efficient reasoning model designed for high-throughput tasks that benefit from advanced tool use, multimodal input, and strong analytical performance. It represents a major upgrade in the o-series line, offering high accuracy in math, coding, and visual tasks—all while maintaining low latency and usage cost.

Key Features

  • Optimized for math, code, and visual reasoning
  • Agentic tool use: able to use browsing, Python, and image generation tools within ChatGPT or via the API
  • Natural and conversational, with improved instruction following and memory
  • Ideal for applications requiring quick, reliable reasoning at scale

Benchmark Performance

  • AIME 2025 (no tools): 92.7%
  • AIME 2025 (with tools): 100% consensus@8
  • GPQA Diamond: 81.4%
  • SWE-Bench Verified: 68.1%
  • MMMU (Multimodal understanding): 81.6%
  • MathVista (Visual math): 72.0%
  • Scale MultiChallenge (instruction following): 43%
  • Humanity’s Last Exam (deep research with tools): 26.6%

Ideal Use Cases

  • Real-time assistant applications requiring light compute
  • Structured reasoning with tool use (web browsing, Python)
  • Lightweight document analysis or visual interpretation
  • High-throughput workflows with tight latency budgets

Access and Usage

  • Available in the Chat Completions API and Responses API
  • Supported in ChatGPT (Free: Think mode) and ChatGPT Team, Pro, Enterprise, and Edu
  • Accessible via function calling and tool integration in custom applications