AI Architect Academy

For engineers and architects sizing an AI build

What a Claude agent actually costs (and how to estimate yours)

Short answer

The per-token price is the same on the Anthropic API and AWS Bedrock — so the platform you pick is not the cost decision. The real driver is how a multi-turn agent works: it re-sends a growing context on every turn, so input tokens accumulate and dominate the bill.

Estimate from the loop, not the rate card. Count the input tokens you re-send across turns, add the output you generate, and apply the per-model rates. Then pull the three levers that actually move it: prompt caching, model routing, and a cap on turns.

The per-model price

Claude's token rates are identical on the Anthropic API and AWS Bedrock. Rates below are per 1M tokens, input / output, verified 23 Jun 2026.

ModelInput ($/1M)Output ($/1M)Good for
Claude Haiku 4.5$1$5Cheap, high-volume sub-tasks: classification, extraction, routing
Claude Sonnet 4.6$3$15The everyday workhorse: most tool-use and reasoning at a balanced cost
Claude Opus 4.8$5$25The hardest reasoning and planning steps where quality justifies the rate

Prices change. These reflect public pricing verified 23 Jun 2026 and must be re-checked against Anthropic's pricing page before you rely on them.

How to estimate a multi-turn agent

An agent is a loop. On each turn it sends the conversation so far — system prompt, prior messages, tool results — plus the new step, and gets back a model response. The catch is that the context grows: turn 5 re-sends almost everything from turns 1–4. So a 10-turn agent doesn't cost 10x a single call; the input side compounds.

The estimate, in words:

total cost ≈ (sum of input tokens sent across all turns × input rate) + (sum of output tokens generated across all turns × output rate)

Because the same prefix rides along on every turn, the first term usually dwarfs the second. That is why input tokens, not output, are where you look first.

An illustrative walk-through (numbers illustrative, not a quote)
Say a Sonnet agent carries a 5,000-token prefix (system prompt + tools + instructions) and adds ~1,000 tokens of new context per turn, generating ~500 output tokens per turn, over 8 turns. The input you re-send climbs each turn (5k, then 6k, then 7k, and so on), so the input total lands in the tens of thousands of tokens while output stays near 4,000. At Sonnet's $3 / $15 per 1M, the input term dominates — and caching the static 5k prefix is what collapses it. Treat these figures as illustrative only; plug in your own measured token counts.

Caching and routing both reduce this materially — see the levers below — but the starting point is always: measure the tokens your loop actually sends, then apply the rates.

What each platform adds on top of tokens

Anthropic API / Claude Managed Agents

Raw API calls are just token cost. Claude Managed Agents add roughly $0.08 per active session-hour on top of token costs — you pay for the time the managed runtime is running, in addition to the tokens it consumes.

AWS Bedrock / AgentCore

Token rates match the direct API. AgentCore Runtime bills for active consumption per second — I/O wait is free — at roughly $0.0895 per vCPU-hour + $0.00945 per GB-hour. Other AgentCore services (memory, gateway, tools) are priced separately.

Cloudflare Workers AI

Cloudflare runs its own model catalogue with its own metering, roughly $0.10–$1.40 per 1M input tokens depending on the model. To run Claude specifically at the edge, route through the Cloudflare AI Gateway to Anthropic — you get edge proximity plus first-party Claude, with caching and cost observability in the gateway.

The three levers that actually move the bill

1. Prompt caching — cache the static prefix

The system prompt, tool definitions, and any fixed context are the same on every turn. Cache them. Cache reads bill at ~0.1x the input rate, and a cache write costs ~1.25x the input rate once. For a multi-turn loop that re-sends a large static prefix, this is the single biggest lever: you pay the write premium once and read at a tenth of the rate thereafter.

2. Model routing — Opus → Sonnet → Haiku

Not every step needs the strongest model. Route the hard planning/reasoning steps to Opus, the everyday tool-use to Sonnet, and the cheap, high-volume sub-tasks (classification, extraction) to Haiku. Given the 5x spread between Haiku and Opus, pushing sub-tasks down a tier compounds quickly.

3. Turn and tool-call caps

An agent with no ceiling can loop. Cap the number of turns and tool calls, and set explicit done-conditions, so a stuck agent stops instead of billing indefinitely. This is a correctness control as much as a cost one.

The bill is a control-plane problem
Inference is usually cheap; uncontrolled loops are what bite. The scary line item is almost never the per-token rate — it's an agent that re-sends a bloated context for 40 turns because nobody capped it or cached the prefix. Treat cost as a property of the harness you build around the model, not a property of the model.
Sources & provenance
  • Anthropic — Claude API pricing and prompt-caching mechanics (platform.claude.com/docs pricing). Per-model rates and the ~0.1x read / ~1.25x write cache multipliers; Managed Agents add ~$0.08/active session-hour.
  • AWS — Amazon Bedrock and Bedrock AgentCore pricing pages. Claude token rates match the direct API; AgentCore Runtime bills active consumption per second (~$0.0895/vCPU-hour + ~$0.00945/GB-hour), other services priced separately.
  • Cloudflare — Workers AI pricing (its own per-model catalogue, roughly $0.10–$1.40 / 1M input tokens; route to Claude via the AI Gateway).
  • Course material: AI Architect Academy — cost-modeling, prompt caching, model routing, and turn budgets.

Pricing changes. The figures here were verified against the providers' pricing pages on 23 Jun 2026 — re-check the source pages before relying on them. Corrections: hello@aiarch.dev.

Learn to cost-model and route agents like an architect.

AI Architect Academy teaches cost-modeling, prompt caching, and model routing as first-class skills — across Anthropic, AWS, and Cloudflare, with the levers that actually move the bill.