GPT-4.1: A true x.1 Release

OpenAI’s latest model is a true x.1 release, with better instruction following, longer memory, and lower cost.

Apr 14, 2025

OpenAI released GPT-4.1 and it’s a real x.1 release. Better instruction following, longer memory, lower cost, and the best performance we’ve seen from any OpenAI model.

But: There’s no system card, no safety update, and it’s not even available in ChatGPT yet. Only the API. Weird.


A True x.1 Release

Comparison of GPT-4.1 to previous models
Comparison of GPT-4.1 to previous intelligence v. latency
Comparison of GPT-4.1 to previous models on coding tasks
GPT-4.1's performance on coding tasks compared to previous models
  • The name actually fits for once — this feels like a 4.1. It’s not a whole new model, but it’s sharper, faster, and much more capable in a few important ways.
  • Instruction following is much better: It handles long, complex video, Q&A, and follows weird edge case instructions — like “summarize a video without subtitles and find the emotional arc.” That’s hard. And it gets it right.
  • And it's built for developer use cases. And coding was a big focus.

1 Million Token Context: Haystack Mode

OpenAI performance chart showing GPT-4.1's performance on long context tasks
  • This is the biggest leap. 4.1 supports up to 1M tokens of context.
  • I usually have to switch to Gemini for long PDFs or webpages, but this changes everything.
  • Needle-in-the-haystack tests show it works — you can drop a sentence into a massive doc and it’ll still find it.
  • Caveat: Performance does degrade over long context. OpenAI’s own chart shows ~84% accuracy at 8k tokens, but only ~50% at 1M. Still — usable at that scale is a big deal.

Mini and Nano Might Be the Real Stars

Comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano on instruction following tasks
Instruction following performance of GPT-4.1, 4.1 Mini, and 4.1 Nano
Comparison of GPT-4.1 pricing with 4.1 Mini and 4.1 Nano
Pricing comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano
  • GPT-4.1 Mini is now about as good as GPT-4o — but cheaper and faster.
  • GPT-4.1 Nano is as good as 4o-mini was — perfect for micro-intelligence tasks (think categorization, info retrieval, content extraction, etc.).
  • These are going to be the new “daily drivers” for a lot of apps.

A Quiet Goodbye to GPT-4.5 Preview

  • GPT-4.5 Preview is being deprecated in the API. July 14 is the deadline.
  • I liked this model. But OpenAI says 4.1 beats or matches it in key areas, and with better latency and cost.
  • You can still use it in ChatGPT for now.

The Stealth Launch of “Quasar” and “Optimus”

A screenshot of the OpenRouter usage of new models
  • This was fun: OpenAI ran stealth tests of 4.1 under the names “Quasar” and “Optimus” through OpenRouter.
  • We all knew who was behind them… but it was fun speculating and testing.
  • Not totally clear why they did this — maybe to test expectations or avoid hype while collecting data?

Truely made for developer use-cases

  • Box ran evals using GPT-4.1 and saw a 27-point gain over 4o for document extraction.
  • That’s a big jump. Box is already rolling it out in production. This makes 4.1 the model of choice for structured data extraction and doc Q&A at the enterprise level. Expect more to follow.
  • Further, they've released a new prompting guide that shows proof of improvements to the model's performance. Typically I've ignored these guides as the tips are often presented as "best practices" but this one is actually useful.

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for: - telling the model to be persistent (+20%) - dont self-inject/parse toolcalls (+2%) - prompted planning (+4%) - JSON BAD - use

Image
Image
Image
Image
ben
ben
@benhylak

o1 is mind-blowing when you know how to use it. it's really not a chat model -- you have to think of it more like a "report generator" (link to article below)

Image
1.9K
Reply

ChatGPT vs API?

  • You can only access 4.1 via API. Not even in ChatGPT Team or Enterprise yet.
  • @emollick points out something interesting: OpenAI seems to be splitting ChatGPT (high-power) from the API (cheap, dev-facing).
  • No system card. No big safety briefing. Is the API side is shipping fast while the product team catches up? Are the teams shipping as fast as they can, independently? Or is something else unfolding?

Parting Thoughts

GPT-4.1 is a nice win for developers. It's faster, cheaper, and performs better across the board. For most apps, this is the new default.

But it's a Monday and I think we'll see bigger things from openai this week.