// LLM cost calculator

LLM cost calculator

Estimate your monthly and annual LLM API bill across OpenAI, Anthropic (Claude), Google Gemini and Azure OpenAI. Set input and output tokens per request, requests per month and the pricing mode, and see the cost broken down by tokens. Free, no signup, and it runs entirely in your browser.

Provider
Model
Usage per request
Pricing mode
// Estimated cost
$0 / month
$0 per year
$0 per 1,000 requests

Breakdown

3 quick wins to cut this

A cheaper model could save up to $0 / month.

  • Right-size the model Route simple calls to a smaller, cheaper model and keep the frontier model only for the hard ones. Often the single biggest saving.
  • Prompt caching Cache the stable part of your prompt (system prompt, context, few-shot examples). Cached input is read at a fraction of the input price.
  • Batch & trim tokens Move non-interactive work to the batch API for ~50% off, and cut output tokens — they cost several times more than input.

Want the real number, and a plan to cut it?

This is a ballpark. Send me your actual prompts and traffic and you’ll get a precise figure plus a concrete plan to bring it down, from model routing to caching. First call is 30 minutes, no charge.

Email me →

Assumptions. Prices are public list prices per million tokens. Anthropic (Claude) figures are current; OpenAI, Google and Azure figures are representative for the common model tiers and are estimates only. Your real bill depends on the exact model, caching, batch and committed-use discounts, and image or tool tokens. Rates last updated 2026-06-26.

Also running on Kubernetes? Try the cloud cost calculator

Frequently asked questions

How is LLM / token cost calculated?

Almost every API bills per token, with separate prices for input (your prompt) and output (the model’s reply). Cost per request = (input tokens ÷ 1,000,000 × input price) + (output tokens ÷ 1,000,000 × output price). Multiply by requests per month for the monthly bill. Output tokens usually cost several times more than input, so long replies dominate.

Why is my OpenAI / Claude / Gemini bill higher than this estimate?

Real bills add things a ballpark can’t: system prompts and context resent on every call, retries, tool-call round trips, image and audio tokens, and reasoning tokens on thinking models. Long shared context that you resend each turn is the usual surprise. Treat this as a floor, and enable prompt caching to bring resent context down.

How do I reduce LLM API costs?

In order of impact: route easy requests to a smaller model and keep the frontier model for the hard ones; cache the stable prompt prefix so resent context is read cheaply; trim output tokens (they are the expensive side); and move non-interactive work to the batch API for roughly half price. Together these often cut a bill by 50% or more.

Which LLM provider is cheapest?

It depends on the model tier and your input/output mix, not the provider name. Small models from any provider are cheap; frontier models are far pricier and differ in how much output they generate for the same task. Because output is the expensive side, a model that answers concisely can beat a nominally cheaper one that rambles. Switch providers and models in the calculator above to compare your own workload.

Is this calculator accurate?

It’s a good directional estimate, not a quote. Claude prices are current; the other providers use representative list prices for the common tiers, refreshed periodically. For an exact number tied to your prompts, traffic and caching, the fastest path is a short call.