Official

openai / gpt-4.1-nano

Fastest, most cost-effective GPT-4.1 model from OpenAI

  • Public
  • 658 runs
  • License
Iterate in playground

Pricing

Official model
Pricing for official models works differently from other models. Instead of being billed by time, you’re billed by input and output, making pricing more predictable.

This model is priced by how many input tokens are sent and how many output tokens are generated.

Check out our docs for more information about how per-token pricing works on Replicate.

Readme

GPT‑4.1 nano is the fastest and most cost-efficient model in the GPT‑4.1 family. It delivers strong performance for lightweight tasks while supporting up to 1 million tokens of context. Designed for speed-critical and high-scale applications, nano is ideal for tasks like classification, autocomplete, and simple reasoning.

Key Features

  • Ultra-low latency and fast response times
  • Lowest cost in the GPT-4.1 lineup
  • Supports 1 million token context windows
  • Optimized for short prompts and high-volume usage
  • Competitive accuracy on key benchmarks

Benchmark Highlights

  • MMLU: 80.1%
  • GPQA: 50.3%
  • Aider Polyglot Diff (diff format): 45%
  • MultiChallenge: 15%
  • IFEval: 75%

Use Cases

  • Text classification
  • Autocomplete and structured text generation
  • Fast Q&A over small or medium context
  • Low-latency applications at scale
  • Budget-sensitive or high-throughput tasks

Notes

  • Available via the OpenAI API
  • Not currently available in ChatGPT
  • Supports up to 1 million tokens of context

GPT‑4.1 nano is built for developers who need speed, scale, and affordability.