GPT-4.1: A true x.1 Release

OpenAI’s latest model is a true x.1 release, with better instruction following, longer memory, and lower cost.

Apr 14, 2025

OpenAI released GPT-4.1 and it’s a real x.1 release. Better instruction following, longer memory, lower cost, and the best performance we’ve seen from any OpenAI model.

But: There’s no system card, no safety update, and it’s not even available in ChatGPT yet. Only the API. Weird.

A True x.1 Release

Comparison of GPT-4.1 to previous models — Comparison of GPT-4.1 to previous intelligence v. latency

Comparison of GPT-4.1 to previous models on coding tasks — GPT-4.1's performance on coding tasks compared to previous models

Comparison of GPT-4.1 to previous intelligence v. latency

GPT-4.1's performance on coding tasks compared to previous models

The name actually fits for once — this feels like a 4.1. It’s not a whole new model, but it’s sharper, faster, and much more capable in a few important ways.
Instruction following is much better: It handles long, complex video, Q&A, and follows weird edge case instructions — like “summarize a video without subtitles and find the emotional arc.” That’s hard. And it gets it right.
And it's built for developer use cases. And coding was a big focus.

1 Million Token Context: Haystack Mode

OpenAI performance chart showing GPT-4.1's performance on long context tasks

This is the biggest leap. 4.1 supports up to 1M tokens of context.
I usually have to switch to Gemini for long PDFs or webpages, but this changes everything.
Needle-in-the-haystack tests show it works — you can drop a sentence into a massive doc and it’ll still find it.
Caveat: Performance does degrade over long context. OpenAI’s own chart shows ~84% accuracy at 8k tokens, but only ~50% at 1M. Still — usable at that scale is a big deal.

Mini and Nano Might Be the Real Stars

Comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano on instruction following tasks — Instruction following performance of GPT-4.1, 4.1 Mini, and 4.1 Nano

Comparison of GPT-4.1 pricing with 4.1 Mini and 4.1 Nano — Pricing comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano

Instruction following performance of GPT-4.1, 4.1 Mini, and 4.1 Nano

Pricing comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano

GPT-4.1 Mini is now about as good as GPT-4o — but cheaper and faster.
GPT-4.1 Nano is as good as 4o-mini was — perfect for micro-intelligence tasks (think categorization, info retrieval, content extraction, etc.).
These are going to be the new “daily drivers” for a lot of apps.

A Quiet Goodbye to GPT-4.5 Preview

OpenAI Developers

@OpenAIDevs

·Follow

One last note: we’ll also begin deprecating GPT-4.5 Preview in the API today as GPT-4.1 offers improved or similar performance on many key capabilities at lower latency and cost. GPT-4.5 in the API will be turned off in three months, on July 14, to allow time to transition (and Show more

7:15 PM · Apr 14, 2025

353

Read 11 replies

GPT-4.5 Preview is being deprecated in the API. July 14 is the deadline.
I liked this model. But OpenAI says 4.1 beats or matches it in key areas, and with better latency and cost.
You can still use it in ChatGPT for now.

The Stealth Launch of “Quasar” and “Optimus”

A screenshot of the OpenRouter usage of new models

This was fun: OpenAI ran stealth tests of 4.1 under the names “Quasar” and “Optimus” through OpenRouter.
We all knew who was behind them… but it was fun speculating and testing.
Not totally clear why they did this — maybe to test expectations or avoid hype while collecting data?

Truly made for developer use-cases

Aaron Levie

@levie

·Follow

OpenAI dropped GPT-4.1 and it’s an insane leap ahead of GTP-4o. On the Box AI Enterprise Eval, it offers a 27 pt jump over GPT4o on data extraction, and is much better at doc and image Q&A. Box is rolling it out now in beta in the Box AI Studio, and GA shortly.

5:57 PM · Apr 14, 2025

342

Read 18 replies

Box ran evals using GPT-4.1 and saw a 27-point gain over 4o for document extraction.
That’s a big jump. Box is already rolling it out in production. This makes 4.1 the model of choice for structured data extraction and doc Q&A at the enterprise level. Expect more to follow.
Further, they've released a new prompting guide that shows proof of improvements to the model's performance. Typically I've ignored these guides as the tips are often presented as "best practices" but this one is actually useful.

swyx 🇬🇧

@swyx

·Follow

don't miss that OAI also published a prompting guide WITH RECEIPTS for GPT 4.1 specifically for those building agents... with a new recommendation for: - telling the model to be persistent (+20%) - dont self-inject/parse toolcalls (+2%) - prompted planning (+4%) - JSON BAD - use Show more

ben (is hiring engineers)

@benhylak

o1 is mind-blowing when you know how to use it. it's really not a chat model -- you have to think of it more like a "report generator" (link to article below)

6:29 PM · Apr 14, 2025

1.9K

Read 35 replies

ChatGPT vs API?

You can only access 4.1 via API. Not even in ChatGPT Team or Enterprise yet.
@emollick points out something interesting: OpenAI seems to be splitting ChatGPT (high-power) from the API (cheap, dev-facing).
No system card. No big safety briefing. Is the API side is shipping fast while the product team catches up? Are the teams shipping as fast as they can, independently? Or is something else unfolding?

Parting Thoughts

GPT-4.1 is a nice win for developers. It's faster, cheaper, and performs better across the board. For most apps, this is the new default.

But it's a Monday and I think we'll see bigger things from openai this week.

Blog...

A blog post is loading...

Jan 1, 2025

Loading…

GPT-4.1: A true x.1 Release

OpenAI’s latest model is a true x.1 release, with better instruction following, longer memory, and lower cost.

Apr 14, 2025

OpenAI released GPT-4.1 and it’s a real x.1 release. Better instruction following, longer memory, lower cost, and the best performance we’ve seen from any OpenAI model.

But: There’s no system card, no safety update, and it’s not even available in ChatGPT yet. Only the API. Weird.

A True x.1 Release

Comparison of GPT-4.1 to previous intelligence v. latency

GPT-4.1's performance on coding tasks compared to previous models

The name actually fits for once — this feels like a 4.1. It’s not a whole new model, but it’s sharper, faster, and much more capable in a few important ways.
Instruction following is much better: It handles long, complex video, Q&A, and follows weird edge case instructions — like “summarize a video without subtitles and find the emotional arc.” That’s hard. And it gets it right.
And it's built for developer use cases. And coding was a big focus.

1 Million Token Context: Haystack Mode

This is the biggest leap. 4.1 supports up to 1M tokens of context.
I usually have to switch to Gemini for long PDFs or webpages, but this changes everything.
Needle-in-the-haystack tests show it works — you can drop a sentence into a massive doc and it’ll still find it.
Caveat: Performance does degrade over long context. OpenAI’s own chart shows ~84% accuracy at 8k tokens, but only ~50% at 1M. Still — usable at that scale is a big deal.

Mini and Nano Might Be the Real Stars

Instruction following performance of GPT-4.1, 4.1 Mini, and 4.1 Nano

Pricing comparison of GPT-4.1, 4.1 Mini, and 4.1 Nano

GPT-4.1 Mini is now about as good as GPT-4o — but cheaper and faster.
GPT-4.1 Nano is as good as 4o-mini was — perfect for micro-intelligence tasks (think categorization, info retrieval, content extraction, etc.).
These are going to be the new “daily drivers” for a lot of apps.

A Quiet Goodbye to GPT-4.5 Preview

OpenAI Developers

@OpenAIDevs

·Follow

7:15 PM · Apr 14, 2025

353

Read 11 replies

GPT-4.5 Preview is being deprecated in the API. July 14 is the deadline.
I liked this model. But OpenAI says 4.1 beats or matches it in key areas, and with better latency and cost.
You can still use it in ChatGPT for now.

The Stealth Launch of “Quasar” and “Optimus”

This was fun: OpenAI ran stealth tests of 4.1 under the names “Quasar” and “Optimus” through OpenRouter.
We all knew who was behind them… but it was fun speculating and testing.
Not totally clear why they did this — maybe to test expectations or avoid hype while collecting data?

Truly made for developer use-cases

Aaron Levie

@levie

·Follow

5:57 PM · Apr 14, 2025

342

Read 18 replies

Box ran evals using GPT-4.1 and saw a 27-point gain over 4o for document extraction.
That’s a big jump. Box is already rolling it out in production. This makes 4.1 the model of choice for structured data extraction and doc Q&A at the enterprise level. Expect more to follow.
Further, they've released a new prompting guide that shows proof of improvements to the model's performance. Typically I've ignored these guides as the tips are often presented as "best practices" but this one is actually useful.

swyx 🇬🇧

@swyx

·Follow

ben (is hiring engineers)

@benhylak

o1 is mind-blowing when you know how to use it. it's really not a chat model -- you have to think of it more like a "report generator" (link to article below)

6:29 PM · Apr 14, 2025

1.9K

Read 35 replies

ChatGPT vs API?

You can only access 4.1 via API. Not even in ChatGPT Team or Enterprise yet.
@emollick points out something interesting: OpenAI seems to be splitting ChatGPT (high-power) from the API (cheap, dev-facing).
No system card. No big safety briefing. Is the API side is shipping fast while the product team catches up? Are the teams shipping as fast as they can, independently? Or is something else unfolding?

Parting Thoughts

GPT-4.1 is a nice win for developers. It's faster, cheaper, and performs better across the board. For most apps, this is the new default.

But it's a Monday and I think we'll see bigger things from openai this week.