GPT-5.5 Price Increase: What It Costs

(openrouter.ai)

Comments

languid-photic 8 May 2026
We track performance vs. the all-in cost of completing real engineering tasks, rather than cost per token. [1]

Cost per token is a bit misleading because, as others have noted, different models use tokens in different ways. (Aside - This is also why TPS isn't a great metric).

We found that 5.5 is about 1.5-2x more expensive overall. On a "Pareto" basis, we only find 5.5 xhigh worth it. At the lower reasoning levels, 5.4 still edges it out on cost/perf.

We take a spec-driven approach and mostly work in TS (on product development), so if you use a more steer-y approach, or work in a different domain, YMMV.

[1] https://voratiq.com/leaderboard?x=cost

spprashant 47 minutes ago
I feel very lost in these threads. A lot of people talk about getting bad results from gpt 5.5 xH or Opus 4.7 xH.

And here I am daily driving Sonnet 4.6 with medium or high thinking. I actually am thoroughly satisfied with the work it does. Perhaps it has to do with the bite sized pieces of work I give it, that fits better with my workflow.

iceKirin 8 May 2026
I feel that the recent iterations of LLM haven't provided an intuitive qualitative leap. Have they entered a bottleneck period so quickly?
XCSme 8 May 2026
~3.5x more expensive to run my benchmarks[0].

[0]: https://aibenchy.com/compare/openai-gpt-5-4-medium/openai-gp...

jsnell 8 May 2026
This doesn't seem to be controlling for the number of turns in any way. Am I missing something?

Stronger models needing fewer turns to achieve a task feels like a prime source of efficiency gains for agentic coding, more so than individual responses being shorter.

gertlabs 20 hours ago
We observed slightly smaller outputs over long horizon agentic coding for GPT 5.5, at a significant improvement in overall response scores. For one-shot coding responses, GPT 5.5 was actually more verbose than GPT 5.4, but again, the responses were significantly stronger. The expected cost increases reported by OpenRouter seem reasonably accurate (perhaps a bit optimistic), but in my opinion, highly worth it. GPT 5.5 has a pretty wide lead on the #2 model for understanding complex scenarios.

Rankings at https://gertlabs.com/rankings?mode=agentic_coding. See the efficiency chart at the bottom.

boh 19 hours ago
New model releases are now like new iPhones--mostly imperceivable improvements with a higher price tag. That's one of the major benefits to open source: you can "freeze" what model you're using. Often it's the model that you know that wins over the one that is different enough that you have to start from scratch with every major update. Most businesses require cost control and predictability over a cutting edge with limited evidence of profitable output outside of tech.
degutemesgen 20 hours ago
I do think recent models are too expensive to be used for customer-facing agentic workflows.
DeathArrow 4 hours ago
In terms of work done per dollar, new models from OpenAI and Anthropic are worse than the older models. They are trying to squeeze the customers.

For personal use I switched to coding plans containing GLM 5.1, Kimi K2.6 and Xiaomi MiMo V2.5 Pro and I never been happier. I said goodbye to both Claude Max and Cursor.

coalhouse 8 May 2026
it does seem like a step change in token efficiency, though based on the earlier artificial analysis reporting it's also quite the cost lottery and i'm not sure i am comfortable with that
i_think_so 8 May 2026
Has any enterprising hacker here yet graphed price vs "output" over time since 2023, taking "quality" into account?

That's got to be a very tricky analysis given how subjective quality is. But I'm sure there are people trying to pin it down.