For engineers and architects sizing an AI build
What a Claude agent actually costs (and how to estimate yours)
The per-token price is the same on the Anthropic API and AWS Bedrock — so the platform you pick is not the cost decision. The real driver is how a multi-turn agent works: it re-sends a growing context on every turn, so input tokens accumulate and dominate the bill.
Estimate from the loop, not the rate card. Count the input tokens you re-send across turns, add the output you generate, and apply the per-model rates. Then pull the three levers that actually move it: prompt caching, model routing, and a cap on turns.
The per-model price
Claude's token rates are identical on the Anthropic API and AWS Bedrock. Rates below are per 1M tokens, input / output, verified 23 Jun 2026.
| Model | Input ($/1M) | Output ($/1M) | Good for |
|---|---|---|---|
| Claude Haiku 4.5 | $1 | $5 | Cheap, high-volume sub-tasks: classification, extraction, routing |
| Claude Sonnet 4.6 | $3 | $15 | The everyday workhorse: most tool-use and reasoning at a balanced cost |
| Claude Opus 4.8 | $5 | $25 | The hardest reasoning and planning steps where quality justifies the rate |
Prices change. These reflect public pricing verified 23 Jun 2026 and must be re-checked against Anthropic's pricing page before you rely on them.
How to estimate a multi-turn agent
An agent is a loop. On each turn it sends the conversation so far — system prompt, prior messages, tool results — plus the new step, and gets back a model response. The catch is that the context grows: turn 5 re-sends almost everything from turns 1–4. So a 10-turn agent doesn't cost 10x a single call; the input side compounds.
The estimate, in words:
total cost ≈ (sum of input tokens sent across all turns × input rate) + (sum of output tokens generated across all turns × output rate)
Because the same prefix rides along on every turn, the first term usually dwarfs the second. That is why input tokens, not output, are where you look first.
Caching and routing both reduce this materially — see the levers below — but the starting point is always: measure the tokens your loop actually sends, then apply the rates.
What each platform adds on top of tokens
Anthropic API / Claude Managed Agents
Raw API calls are just token cost. Claude Managed Agents add roughly $0.08 per active session-hour on top of token costs — you pay for the time the managed runtime is running, in addition to the tokens it consumes.
AWS Bedrock / AgentCore
Token rates match the direct API. AgentCore Runtime bills for active consumption per second — I/O wait is free — at roughly $0.0895 per vCPU-hour + $0.00945 per GB-hour. Other AgentCore services (memory, gateway, tools) are priced separately.
Cloudflare Workers AI
Cloudflare runs its own model catalogue with its own metering, roughly $0.10–$1.40 per 1M input tokens depending on the model. To run Claude specifically at the edge, route through the Cloudflare AI Gateway to Anthropic — you get edge proximity plus first-party Claude, with caching and cost observability in the gateway.
The three levers that actually move the bill
1. Prompt caching — cache the static prefix
The system prompt, tool definitions, and any fixed context are the same on every turn. Cache them. Cache reads bill at ~0.1x the input rate, and a cache write costs ~1.25x the input rate once. For a multi-turn loop that re-sends a large static prefix, this is the single biggest lever: you pay the write premium once and read at a tenth of the rate thereafter.
2. Model routing — Opus → Sonnet → Haiku
Not every step needs the strongest model. Route the hard planning/reasoning steps to Opus, the everyday tool-use to Sonnet, and the cheap, high-volume sub-tasks (classification, extraction) to Haiku. Given the 5x spread between Haiku and Opus, pushing sub-tasks down a tier compounds quickly.
3. Turn and tool-call caps
An agent with no ceiling can loop. Cap the number of turns and tool calls, and set explicit done-conditions, so a stuck agent stops instead of billing indefinitely. This is a correctness control as much as a cost one.
- Anthropic — Claude API pricing and prompt-caching mechanics (platform.claude.com/docs pricing). Per-model rates and the ~0.1x read / ~1.25x write cache multipliers; Managed Agents add ~$0.08/active session-hour.
- AWS — Amazon Bedrock and Bedrock AgentCore pricing pages. Claude token rates match the direct API; AgentCore Runtime bills active consumption per second (~$0.0895/vCPU-hour + ~$0.00945/GB-hour), other services priced separately.
- Cloudflare — Workers AI pricing (its own per-model catalogue, roughly $0.10–$1.40 / 1M input tokens; route to Claude via the AI Gateway).
- Course material: AI Architect Academy — cost-modeling, prompt caching, model routing, and turn budgets.
Pricing changes. The figures here were verified against the providers' pricing pages on 23 Jun 2026 — re-check the source pages before relying on them. Corrections: hello@aiarch.dev.
Learn to cost-model and route agents like an architect.
AI Architect Academy teaches cost-modeling, prompt caching, and model routing as first-class skills — across Anthropic, AWS, and Cloudflare, with the levers that actually move the bill.
Get notified when new tracks ship.