GPT 5.5
OpenAI shipped GPT-5.5. Here’s what it does and where it fits.
OpenAI released GPT-5.5 this week. GPT-5.4 shipped a bit over a month ago, which means two significant model releases from the same lab in short time.
OpenAI calls it “smartest and most intuitive to use model yet.” What that means in practice: a faster, sharper thinker that uses fewer tokens for the same work, better at multi-step autonomous tasks, aimed squarely at enterprise workflows like coding, data analysis, and scientific research.
API pricing doubled. GPT-5.5 costs roughly 2x GPT-5.4 per token. Whether that is a cost increase or a cost decrease for you depends entirely on your workload.
Why I care about this one
I run a stack of models for different jobs. Claude for writing, agents, and anything where the reasoning matters as much as the output. GPT models for specific workloads where OpenAI has pulled ahead. Gemini for long-context and vision.
A major OpenAI release redraws the distribution of work across that stack. Sometimes the release moves my defaults. Sometimes it doesn’t. GPT-5.5 is mixed, and the pricing math is more interesting than the headline capability.
What it is
GPT-5.5 is OpenAI’s newest frontier model. Released this week, rolling out to Plus, Pro, Business, and Enterprise tiers in ChatGPT. A separate GPT-5.5 Pro version goes to the Pro, Business, and Enterprise tiers.
The headline capability is multi-step autonomy. You hand the model a messy task with several parts and unclear direction, and it plans the approach, uses tools to work on each piece, checks its own work, and iterates toward a result. Less prompt engineering, more outcome.
Named improvement areas:
Agentic coding workflows
Knowledge work like reports, analysis, and documents
Mathematical reasoning
Scientific and technical research, including drug discovery
Cybersecurity and digital defence
Operating software and navigating computer work
GPT-5.5 is reportedly more token efficient than GPT-5.4 on the same tasks. For long agentic workflows where token costs compound across turns, that matters more than the per-token price.
GPT-5.5 Pro differences are not disclosed in the announcement. OpenAI has said Pro exists and who gets it, not what it does differently.
How to use it
Three workflows where GPT-5.5 is the most likely fit based on the announcement.
Long agentic coding runs. If you ship agents that write, debug, and operate on code over long sessions, GPT-5.5 is worth testing as the default. Better multi-step autonomy plus token efficiency compounds on long runs, which is where simple per-token comparisons hide the real cost picture.
Data analysis and spreadsheet work. The announcement calls out creating documents and spreadsheets, analysing data, and operating software. Knowledge workers who live in Excel, Google Sheets, and BI tools will feel this one first, especially on multi-step analysis pipelines.
Scientific and technical research. A named improvement area. If your job involves literature review, hypothesis generation, or technical problem solving, A/B test against your current tool on a week’s worth of real queries.
What to keep on other models. Writing where voice matters. Long reasoning tasks where you want to watch the model work. Vision-heavy workflows. Long-context document synthesis where Gemini still wins. Vault agents, content drafts, and anything tuned to a specific voice. Those stay where they are until OpenAI ships capability improvements in those areas.
Where it’s strong
Token efficiency on long tasks. If a GPT-5.4 run cost you $2 in tokens, a GPT-5.5 run on the same task should cost less, even with the higher per-token rate, because the model does the work in fewer turns. The effect grows with task length.
Multi-step autonomy. Less hand-holding. Broader task descriptions produce usable outcomes. The detailed step-by-step prompt becomes optional rather than required. For agent design, this shifts where the effort goes: less time writing prompts, more time writing guardrails.
Enterprise positioning. The announcement reads legal-friendly. “Intelligence for real work” clears IT and compliance conversations faster than experimental framing. For anyone trying to get a model past security review, the marketing has operational value.
Where it falls short
API pricing doubled. GPT-5.5 costs 2x GPT-5.4 per token on the API. Even with token efficiency gains, high-volume shops running simple classification, summarisation, or retrieval may find the math does not work. Model-match instead of blanket-upgrade.
Benchmarks are vague. The announcement refers to “consistently scoring higher” than previous models and competitors including Gemini 3.1 Pro and Anthropic’s Claude Opus, but no specific benchmark names or numbers are published. Treat performance claims as marketing until independent evals land.
GPT-5.5 Pro is a black box. OpenAI has said Pro exists and who gets it, not what it does differently. If you are on a Pro tier, you are paying for capabilities that have not been disclosed. Expect details over the next two weeks.
Specific capabilities are not specified. “Agentic coding” and “scientific research” are listed at the category level without concrete examples. What types of agentic coding? What scientific domains? Test before you commit.
Strongest safeguards cuts both ways. The release ships with “strongest set of safeguards to date.” That is good for trust and friction for edge cases. Expect more refusals in workflows that touch sensitive topics. Test your specific use cases.
How to actually run the test
If you are on the API and considering switching defaults, don’t do it on vibes. Run a concrete evaluation.
Pick three of your highest-volume tasks. For each, run the same prompt through:
GPT-5.4 (current OpenAI default)
GPT-5.5 (new)
Your best alternative (Claude, Gemini, or whatever you currently run)
Record four numbers per run:
Tokens used. Efficiency claim meets reality.
Time to completion. Speed matters for UX even when cost does not.
Quality. Blind-score the outputs if you can. Your biases matter less than you think.
Total cost per completed task. The only number that matters at the end.
Run each task five times on each model. Average the numbers. Make the call on cost per completed task, not per-token rate.
If one of the three tasks shows GPT-5.5 winning on cost per completed outcome, migrate that workload and keep the others where they are. Model-match is more accurate than blanket upgrades for anyone running at meaningful volume.
My take
I am not switching my defaults on the announcement. I will test GPT-5.5 on three things: long agentic coding runs where token efficiency might pay off, spreadsheet and data analysis tasks where the announcement suggests a clear improvement, and a handful of scientific research queries as a second opinion alongside Claude.
The rest of my work stays where it is. Writing, vault agents, content drafts all stay on Claude. Long-context synthesis stays on Gemini. Image work stays on whichever model fits the task.
Two broader observations.
First, the release cadence is doing more work than the release. GPT-5.4 to GPT-5.5 in a matter of weeks is a pace that rewards tool flexibility over tool loyalty. Whatever you picked six months ago as your default is either still the best fit or silently behind, and the only way to know is to test.
Second, “fewer tokens for the same work” is the metric to watch. Per-token pricing has been the defining cost lever for two years. The new lever is per-task efficiency: how many turns the model needs to get there. A 2x per-token price can still be a win if the task completes in one turn instead of three. Track cost per outcome, not cost per call.
Where this lands
GPT-5.5 is a solid iteration. Expect improvements on specific workloads rather than a reset across the board. The pricing math needs to run against your actual usage before you commit.
For most Plus users, GPT-5.5 becomes the default inside ChatGPT automatically. For API-layer teams, the call is more interesting and more expensive.
Test on your workload. Don’t trust the marketing without evals.
- JC

