Show HN: Price Per Token – LLM API Pricing Data Hackernews Viewer

Show HN: Price Per Token – LLM API Pricing Data

339 points by alexellman 25 July 2025 | 130 comments

Comments

numlocked 25 July 2025

(I work at OpenRouter)

We have solved this problem by working with the providers to implement a prices and models API that we scrape, which is how we keep our marketplace up to date. It's been a journey; a year ago it was all happening through conversations in shared Slack channels!

The pricing landscape has become more complex as providers have introduced e.g. different prices for tokens depending on prompt length, caching, etc.

I do believe the right lens on this is actually the price per token by endpoint, not by model; there are fast/slow versions, thinking/non-thinking, etc. that can sometimes also vary by price.

The point of this comment is not to self promote, but we have put a huge amount of work into figuring all of this out, and have it all publicly available on OpenRouter (admittedly not in such a compact, pricing-focused format though!)

sophia01 25 July 2025

But the data is... wrong? Google Gemini 2.5 Flash-Lite costs $0.10/mtok input [1] but is shown here as $0.40/mtok?

[1] https://ai.google.dev/gemini-api/docs/pricing#gemini-2.5-fla...

awongh 25 July 2025

This is great, but as others have mentioned the UX problem is more complicated than this:

- for other models there are providers that serve the same model with different prices

- each provider optimizes for different parameters: speed, cost, etc.

- the same model can still be different quantizations

- some providers offer batch pricing (e.g., Grok API does not)

And there are plenty of other parameters to filter over- thinking vs. non-thinking, multi-modal or not, etc. not to even mention benchmarks ranking.

https://artificialanalysis.ai gives a blended cost number which helps with sorting a bit, but a blended cost model for input/output costs are going to change depending on what you're doing.

I'm still holding my breath for a site that has a really nice comparison UI.

Someone please build it!

pierre 25 July 2025

Main issue is that token are not equivalent across provider / models. With huge disparity inside provider beyond the tokenizer model:

- An image will take 10x token on gpt-4o-mini vs gpt-4.

- On gemini 2.5 pro output token are token except if you are using structure output, then all character are count as a token each for billing.

- ...

Having the price per token is nice, but what is really needed is to know how much a given query / answer will cost you, as not all token are equals.

mythz 25 July 2025

There was a time when it was unbelievably frustrating to navigate the bunch of marketing pages required to find the cost of a newly announced model, now I just look at OpenRouter to find pricing.

CharlesW 25 July 2025

Site is down as I type this, but a shout-out to Simon Willison's LLM pricing calculator: https://www.llm-prices.com/

paradite 25 July 2025

It's actually more complex than just input and output tokens, there are more pricing rules by various providers:

- Off-peak pricing by DeepSeek

- Batch pricing by OpenAI and Anthropic

- Context window differentiated pricing by Google and Grok

- Thinking vs non-thinking token pricing by Qwen

- Input token tiered pricing by Qwen coder

I originally posted here: https://x.com/paradite_/status/1947932450212221427

criddell 25 July 2025

If you had a $2500ish budget for hardware, what types of models could you run locally? If $2500 isn't really enough, what would it take?

Are there any tutorials you can recommend for somebody interested in getting something running locally?

NitpickLawyer 25 July 2025

> The only place I am aware of is going to these provider's individual website pages to check the price per token.

Openrouter is a good alternative. Added bonus that you can also see where the open models come in, and can make an educated guess on the true cost / size of a model, and how likely it is it's currently subsidised.

bananapub 25 July 2025

surprising that you didn't find any of the existing ones, including our own simonw's: https://www.llm-prices.com

callbacked 25 July 2025

Awesome list, any chance of adding OpenRouter? Looking at their website seems like it would be a pain to scrape all of that due to the site's layout.

nisegami 25 July 2025

How consistent is the tokenization across different model families? It always served as a mental hangup for me when comparing LLM inference pricing.

aaronharnly 25 July 2025

Can you gather historical information as well? I did a bit of spelunking of the Wayback Machine to gather a partial dataset for OpenAI, but mine is incomplete. Future planning is well-informed by understanding the trends — my rough calculation was that within a model family, prices drop by about 40-80% per 12 months.

can16358p 25 July 2025

Does anyone know why o1-pro is more expensive than o3-pro?

uponasmile 25 July 2025

Well done. The UX is solid. Clean, intuitive, and the use of color makes everything instantly clear

tekacs 25 July 2025

I've run into this a ton of times and these websites all kinda suck. Someone mentioned the OpenRouter /models endpoint in a sibling comment here, so I quickly threw this together just now. Please feel free to PR!

https://github.com/tekacs/llm-pricing

  llm-pricing

  Model                                     | Input | Output | Cache Read | Cache Write
  ------------------------------------------+-------+--------+------------+------------
  anthropic/claude-opus-4                   | 15.00 | 75.00  | 1.50       | 18.75      
  anthropic/claude-sonnet-4                 | 3.00  | 15.00  | 0.30       | 3.75       
  google/gemini-2.5-pro                     | 1.25  | 10.00  | N/A        | N/A        
  x-ai/grok-4                               | 3.00  | 15.00  | 0.75       | N/A        
  openai/gpt-4o                             | 2.50  | 10.00  | N/A        | N/A        
  ...

---

  llm-pricing calc 10000 200 -c 9500 opus-4 4.1

  Cost calculation: 10000 input + 200 output (9500 cached, 5m TTL)
  
  Model                      | Input     | Output    | Cache Read | Cache Write | Total    
  ---------------------------+-----------+-----------+------------+-------------+----------
  anthropic/claude-opus-4    | $0.007500 | $0.015000 | $0.014250  | $0.178125   | $0.214875
  openai/gpt-4.1             | $0.001000 | $0.001600 | $0.004750  | $0.000000   | $0.007350
  openai/gpt-4.1-mini        | $0.000200 | $0.000320 | $0.000950  | $0.000000   | $0.001470
  openai/gpt-4.1-nano        | $0.000050 | $0.000080 | $0.000237  | $0.000000   | $0.000367
  thudm/glm-4.1v-9b-thinking | $0.000018 | $0.000028 | $0.000333  | $0.000000   | $0.000378

---

  llm-pricing opus-4 -v

  === ANTHROPIC ===

  Model: anthropic/claude-opus-4
    Name: Anthropic: Claude Opus 4
    Description: Claude Opus 4 is benchmarked as the world's best coding model, at time of release, 
    bringing sustained performance on complex, long-running tasks and agent workflows. It sets new 
    benchmarks in software engineering, achieving leading results on SWE-bench (72.5%) and 
    Terminal-bench (43.2%).
    Pricing:
      Input: $15.00 per 1M tokens
      Output: $75.00 per 1M tokens
      Cache Read: $1.50 per 1M tokens
      Cache Write: $18.75 per 1M tokens
      Per Request: $0
      Image: $0.024
    Context Length: 200000 tokens
    Modality: text+image->text
    Tokenizer: Claude
    Max Completion Tokens: 32000
    Moderated: true

ashwindharne 25 July 2025

KV caching is priced and managed quite differently between providers as well. Seeing as it becomes a huge chunk of the actual tokens used, wondering if there's an easy way to compare across providers.

lucasoshiro 25 July 2025

"OpenAI, Anthropic, Google and more", where "and more" = 0. Where's Gemma, DeepSeek, etc?

The UI, however, is really clean and straight to the point. I like the interface, but miss the content

kb_geek 25 July 2025

Nice! It will be good to also pull in leaderboard rankings and/or benchmarks for each of these models, so we understand capability perhaps from lmsys (not sure if there is a better source)

Fripplebubby 25 July 2025

Maybe I am blinded by my own use case, but I find the caching pricing and strategy (since different providers use a different implementation of caching as well as different pricing) to be a major factor rather than just the "raw" per token cost, and that is missing here, as well as on the Simon Willison site [1]. Do most people just not care / not use caching that much that it matters?

[1] https://llm-prices.com/

l5870uoo9y 25 July 2025

It appears that GPT-4.1 is missing, but nano and mini are there.

nikvdp 25 July 2025

there's also http://llmprices.dev. similar, but with a searchbox for quick filtering

james2doyle 25 July 2025

Nice. I think I prefer https://models.dev/ as it seems more complete

jalopy 25 July 2025

Super valuable resource - thanks!

What tools / experiments out there exist to exercise these cheaper models to output more tokens / use more CoT tokens to achieve the quality of more expensive models?

eg, Gemini 2.5 flash / pro ratio is 1 1/3 for input, 1/8 for output... Surely there's a way to ask Flash to critique it's work more thoroughly to get to Pro level performance and still save money?

fronty 25 July 2025

We are working on a similar problem, https://apiraces.com, to personalize the cost calculation of your llm api use case,

We have uploaded mostly the openrouter api models, but trying to do it in a useful way to personalize calculation and comparison. If someone would like to test or have a demo, we will be glad for any feedback.

binarymax 25 July 2025

Does anyone have an API that maintains a list of all model versions for a provider? I hand-update OpenAI into a JSON file that I use for cost reporting in my apps (and in an npm package called llm-primitives).

Here's the current version:

    const pricesPerMillion = {
        "o1-2024-12-17": { input: 15.00, output: 60.00 },
        "o1-mini-2024-09-12": { input: 1.10, output: 4.40 },
        "o3-mini-2025-01-31": { input: 1.10, output: 4.40 },
        "gpt-4.5-preview-2025-02-27": { input: 75.00, output: 150.00 },
        "gpt-4o": { input: 5.00, output: 15.00 },
        "gpt-4o-2024-08-06": { input: 2.50, output: 10.00 },
        "gpt-4o-2024-05-13": { input: 5.00, output: 15.00 },
        "gpt-4o-mini": { input: 0.15, output: 0.60 },
        "gpt-4o-mini-2024-07-18": { input: 0.15, output: 0.60 },
        "gpt-4-0613": { input: 30.00, output: 60.00 },
        "gpt-4-turbo-2024-04-09": { input: 10.00, output: 30.00 },
        "gpt-3.5-turbo": { input: 0.003, output: 0.006 },
        "gpt-4.1": { input: 2.00, output: 8.00 },
        "gpt-4.1-2025-04-14": { input: 2.00, output: 8.00 },
        "gpt-4.1-mini": { input: 0.40, output: 1.60 },
        "gpt-4.1-mini-2025-04-14": { input: 0.40, output: 1.60 },
        "gpt-4.1-nano": { input: 0.10, output: 0.40 },
        "gpt-4.1-nano-2025-04-14": { input: 0.10, output: 0.40 },
        "gpt-4o-audio-preview-2024-12-17": { input: 2.50, output: 10.00 },
        "gpt-4o-realtime-preview-2024-12-17": { input: 5.00, output: 20.00 },
        "gpt-4o-mini-audio-preview-2024-12-17": { input: 0.15, output: 0.60 },
        "gpt-4o-mini-realtime-preview-2024-12-17": { input: 0.60, output: 2.40 },
        "o1-pro-2025-03-19": { input: 150.00, output: 600.00 },
        "o3-pro-2025-06-10": { input: 20.00, output: 80.00 },
        "o3-2025-04-16": { input: 2.00, output: 8.00 },
        "o4-mini-2025-04-16": { input: 1.10, output: 4.40 },
        "codex-mini-latest": { input: 1.50, output: 6.00 },
        "gpt-4o-mini-search-preview-2025-03-11": { input: 0.15, output: 0.60 },
        "gpt-4o-search-preview-2025-03-11": { input: 2.50, output: 10.00 },
        "computer-use-preview-2025-03-11": { input: 3.00, output: 12.00 }
    };

I would love to replace this with an API call.

antimatter15 25 July 2025

The `ccusage` npm package pulls prices and other information from LiteLLM which has a lot of diferent models: https://raw.githubusercontent.com/BerriAI/litellm/main/model...

StratusBen 25 July 2025

The http://ec2instances.info/ of the LLM era ;)

eugene3306 25 July 2025

what's point of comparing token prices? especially for thinking models.

Just now I was testing the new Qwen3-thinking model. I've run the same prompt five times. The costs I got, sorted: 0.0143, 0.0288, 0.0321, 0.0389, 0.048 . And this is for single model.

Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro, despite token costs being higher.

jacob019 25 July 2025

Love it! It's going on my toolbar. I face the same problem, constantly trying to hunt down the latest pricing which is often changing. I think it's great that you want to add more models and features, but maybe keep the landing page simple with a default filter that just shows the current content.

manishsharan 25 July 2025

Is there a reason why you have not added DeepSeek and Qwen and Meta ?

You should also aggregate prices from Vertex and AWS Bedrock .

iambateman 25 July 2025

This is cool! Two requests:

- Filter by model "power" or price class. I want compare the mini models, the medium models, etc.

- I'd like to see a "blended" cost which does 80% input + 20% output, so I can quickly compare the overall cost.

Great work on this!

antoineMoPa 25 July 2025

It would be fun to compare with inference providers (groq/vertex ai, etc.).

ssalka 25 July 2025

I'd love to see this data joined with common benchmarks, in order to see which models get you the most "bang for your buck", i.e. benchmark score / token cost

alienbaby 25 July 2025

I'd like to be able to compare prices to determine things like;

Should I use copilot pro in agent mode with sonnet 4, or is it cheaper to use claude with sonnet 4 directly?

OutOfHere 25 July 2025

It doesn't even list the price for GPT-4.1 (full model). This means it's not thorough and it doesn't try. What an immediate disappointment.

generalizations 25 July 2025

This is awesome! I wonder how possible it is to incorporate benchmarks - maybe as a filter? Since not all tokens are as useful as others. Heh.

d4rkp4ttern 27 July 2025

There’s also this: https://models.dev/

julianozen 25 July 2025

Keeping this up to date would be a good use for an agent. Companies might even pay for something like this

Fanofilm 25 July 2025

They should add grok. I use grok.

forrestthewoods 25 July 2025

Neat. Would love to see this plotted on a Pareto curve to show quality of said tokens.

dgrin91 25 July 2025

Cool site. Would be interesting to add a time dimension to track prices over time

ieuanking 27 July 2025

my friend and I built something similar https://app.ubik.studio/all-models

intellectronica 25 July 2025

Should read "Up to date prices for Closed American LLM APIs"

hagope 25 July 2025

this is great, I've always wanted something like this, do you think you can add other model metadata, like api name (`gemini-2.5-pro`), context length, modalities, etc

cahaya 25 July 2025

Nice! Missing a cost calculator with input and output fields.

cchance 27 July 2025

Seems odd to not have r1, qwen etc, groq, etc

jimbo808 25 July 2025

Are we really at a point already where we're treating tokens as a commodity? I certainly would not consider a token generated by Claude or Gemini to be of similar value to a token by Copilot, for example.

sshah_24 25 July 2025

can we not just self host, expose things through VPN, and something that needs sharing with the world, then tunnel through some cloud server to keep the internal servers secure?

I am newly to this hobby, but would like to know more about what experienced person things and do.

amelius 25 July 2025

Ok, that's price per token, but tells me nothing about the IQ of the models.

BartjeD 25 July 2025

Mistral is missing

peterspath 25 July 2025

I am missing Grok

krashidov 25 July 2025

Where is Claude 3.5 Sonnet? Arguably the best model still lol

DrJid 25 July 2025

This is actually really awesome To see. Opened my eyes a bit. Ignore the haters.

dust42 25 July 2025

tldr; low effort website that only contains 26 Google, OpenAI and Anthropic models and only input and output prices but no info about prompt cache and prompt cache prices. For a list of 473 models of 60+ providers with input, output, context, prompt caching and usage: https://openrouter.ai/models (no affiliation)