Cross-platform decision guide · kept current

Where to run your Claude agent: API vs Bedrock vs Cloudflare

Name: Claude agent deployment cost comparison: Anthropic API vs AWS Bedrock vs Cloudflare
Creator: aiArch

By Wibo · Amsterdam Published 13 Jun 2026 Last updated 26 Jun 2026 ~7 min read

Short answer

It's not a price decision — Claude's per-token cost is identical on the Anthropic API and AWS Bedrock. It's a decision about where you want the abstraction boundary to sit: do you want Anthropic to manage the agent runtime, do you want it inside your own AWS account next to your IAM and CloudWatch, or do you want it at the edge close to users?

Rough rule: Anthropic API / Managed Agents for the fastest path and newest features; Bedrock when you're already on AWS and want consolidated IAM/billing/governance; Cloudflare for low-latency, globally distributed, edge-native workloads. Details and current pricing below.

The decision in one table

Option	Choose it when	Cost shape	Watch out for
Anthropic API / Claude Managed Agents	You want the simplest path, the newest models first, and Anthropic to own the agent harness/session lifecycle	Standard Claude token rates; Managed Agents add ~$0.08 per active session-hour	Less native integration with your existing cloud IAM/observability
AWS Bedrock / AgentCore	You already run on AWS (IAM, VPC, CloudWatch, consolidated billing) and want governance in one place	Identical Claude token rates; AgentCore Runtime bills for active consumption per second, I/O wait free (~$0.0895/vCPU-hour + ~$0.00945/GB-hour); other AgentCore services priced separately	New Claude models/features tend to land on the direct API first, with Bedrock catching up within weeks
Cloudflare Workers AI	You need global edge inference, low latency for user-facing apps, and data-residency by default	Its own model catalogue and metering (per-model rates, roughly $0.10–$1.40 / 1M input tokens depending on model)	A different model catalogue than first-party Claude; use the AI Gateway to route to Claude where needed

Prices change. These reflect public pricing as of June 2026 and must be re-checked against each provider's pricing page before you rely on them.

Claude API vs Bedrock: which should you pick?

Pick the Anthropic API for the fastest path and the newest models first; pick AWS Bedrock when you already run on AWS and want Claude inside your existing IAM, VPC, CloudWatch, and consolidated billing. Claude's per-token price is the same on both, so the decision is governance and operational fit, not cost. New models and features usually land on the direct API first and reach Bedrock within weeks — if being first matters, that favors the API; if single-pane AWS governance matters more, that favors Bedrock.

Two recent additions worth knowing (June 2026)

AWS Lambda MicroVMs. If you run on AWS, this gives each agent session its own Firecracker-isolated micro-VM — snapshot-based fast resume with state preserved across a session — purpose-built to run untrusted or model-generated code. AWS documents it explicitly as a sandbox for Claude Managed Agents: Anthropic still hosts the agent loop and the model, and the MicroVM is where the agent's bash/file tool calls actually execute. It's the cleanest answer to "where do my agent's tools run" when you want hard per-session isolation inside your own AWS account.

Claude in Microsoft Foundry. Claude is now also reachable through Microsoft Foundry on Azure — Azure-native endpoints and billing, with inference still running on Anthropic's own infrastructure. It is in public preview. This sits outside the AWS + Cloudflare scope this guide compares; it's noted here only so the picture is complete.

Claude Platform on AWS. A third way to reach Claude from an AWS account, distinct from Bedrock: Claude Platform on AWS has Anthropic — not AWS — operate the inference stack, while AWS handles authentication (SigV4 or API key), IAM-based access control, and billing through AWS Marketplace. Because Anthropic runs it, it carries the full platform surface (Agent Skills, code execution, beta features) with typically same-day feature access, closing the weeks-long lag Bedrock usually has behind the direct API. The tradeoff: inference may route outside AWS, so it doesn't fit if data residency inside AWS is a hard requirement — Bedrock keeps inference in-account.

Agent runtimes compared: Cloudflare vs AWS AgentCore

The tables above answer "whose token bill, whose IAM." The other half of the decision is where the agent process actually runs and keeps its state between turns — the runtime. Two managed answers dominate once you've left a plain request/response Lambda behind: Cloudflare's Durable Objects + Agents SDK at the edge, and AWS Bedrock AgentCore Runtime inside your account. They make opposite bets about locality and session length.

On Cloudflare, each agent is a Durable Object — a single-threaded, globally-unique instance with its own transactional SQL storage. The Agents SDK builds on that to give every agent persistent state and WebSocket connections, and WebSocket Hibernation lets an idle agent evict from memory between turns while its connections stay open — so you don't pay duration charges for the wait. (In-memory state is discarded on hibernation; persisted state survives, which is the whole point of the SQL store.) AgentCore Runtime is the opposite shape: a managed, serverless, framework- and model-agnostic runtime that isolates each session in its own microVM, supports sessions up to 8 hours, and bills active consumption per second — CPU is free during I/O wait, so a session that's mostly waiting on a tool or the model costs little.

Dimension	Cloudflare Workers AI + Durable Objects	AWS Bedrock AgentCore Runtime
State model	Per-agent Durable Object: single-threaded, transactional SQL storage; WebSocket Hibernation keeps connections while evicting memory between turns. Agents SDK manages the persisted state for you.	Managed serverless runtime; per-session microVM isolation, sessions up to 8h. Session state is ephemeral — durable cross-session memory is a separate service (AgentCore Memory).
Billing model	Requests + duration (GB-s); idle/hibernating objects accrue no duration charge. Qualitative shape — verify per-unit rates on the pricing page.	Active consumption per second (~$0.0895/vCPU-hour + ~$0.00945/GB-hour); CPU not billed during I/O wait, memory billed on peak while the session is held.
Latency & locality	Runs at the edge across Cloudflare's global network, close to users; data residency by default.	Runs regionally inside your AWS account, next to your IAM, VPC, and CloudWatch.
Ecosystem fit	Cloudflare-native (Workers, R2, KV, Vectorize). Claude is not in the Workers AI catalogue — route to it through the AI Gateway to Anthropic.	AWS-native and composable with AgentCore Memory, Gateway, Identity, and built-in Tools (Code Interpreter, Browser); framework- and model-agnostic.
Best when	Low-latency, globally distributed, many small per-user agents that idle between bursts.	AWS-native estates and longer, compute-heavy sessions that want governance and durable memory in one place.

Don't conflate "where it runs" with "what it costs"

The runtime decision is about locality and session shape, not the model bill — Claude's token price is unchanged either way. For the full per-agent cost math (tokens + runtime + caching) see the cost breakdown; for the AgentCore service suite in depth see the AgentCore explainer.

Token price is not the variable

Because Claude's per-token pricing is identical on the Anthropic API and Bedrock, the platform choice rarely turns on the model bill. It turns on operational fit: where your identity, secrets, networking, logging, and billing already live, and how much of the agent runtime you want to operate yourself versus hand to the provider. That framing — the abstraction boundary — is the spine of the decision.

Model tiers, for cost-modeling

Across providers, the Claude tiers price (input/output per million tokens) roughly as: Haiku for cheap high-volume work, Sonnet mid-tier, Opus for the hardest reasoning. A model router (Opus → Sonnet → Haiku) plus prompt caching of the static prefix is usually a bigger cost lever than the platform choice. Verify current per-token rates on the provider pages before modeling.

A note on lock-in

The cleanest hedge is to keep the agent logic and tool layer portable and put a routing layer (such as an AI Gateway) between your code and the provider, so switching where inference runs is a config change, not a rewrite. That's also good cost-observability hygiene. Don't over-engineer it before you have a reason to switch — but design the boundary so you can. Which layers you are actually placing when you pick a runtime — orchestration, memory, tools, retrieval, the operational plane — is laid out in agentic AI architecture, including the same three platforms mapped layer by layer.

Frequently asked

Is Claude cheaper on Bedrock or on the Anthropic API?

Per-token pricing for Claude is the same on both as of 2026. Choose on operational fit (where your IAM/observability/billing live and how new you need the model versions), not on token price.

What do Claude Managed Agents cost on top of tokens?

Roughly $0.08 per active session-hour in addition to normal token rates — you pay for the time the managed agent runtime is running. Verify on Anthropic's current pricing page.

Can I run the same Claude model on Cloudflare?

Cloudflare Workers AI has its own model catalogue. To use Claude specifically at the edge, route through the Cloudflare AI Gateway to Anthropic — you get edge proximity plus first-party Claude, with caching and cost observability in the gateway.

Cloudflare or AWS AgentCore for running an agent?

It's a runtime-shape choice, not a price one. Cloudflare's Durable Objects + Agents SDK put a stateful per-agent instance at the edge with hibernation between turns — good for low-latency, globally distributed, many-small-agent workloads. AWS Bedrock AgentCore Runtime is a managed serverless runtime that isolates each session in its own microVM, runs sessions up to 8 hours, and bills active consumption per second (I/O wait free) — good for AWS-native estates and longer, compute-heavy sessions. Both reach Claude at the same token price.

Which should I learn first?

Learn the portable layer first — the agentic loop, tools, and evals — because it transfers across all three. Then learn the deployment specifics of whichever platform your target employers use. The integrated path teaches all three deliberately.

Sources & provenance

Anthropic — Claude API pricing; Managed Agents add ~$0.08/active session-hour (public beta since Apr 2026).
AWS — Amazon Bedrock AgentCore pricing (Runtime $0.0895/vCPU-hour + $0.00945/GB-hour, active-consumption billing) and Bedrock model pricing (Claude token rates matching the direct API).
Cloudflare — Workers AI pricing (its own per-model catalogue; current text models run roughly $0.10–$1.40 / 1M input tokens). Abstraction-boundary framing also draws on cross-platform comparisons (DEV.to AWS Builders, CloudZero, Anchor Sprint, 2026).
AWS — Lambda MicroVMs (announced 22 Jun 2026) and the MicroVMs sandbox for Claude Managed Agents integration guide.
Anthropic — Claude in Microsoft Foundry (Azure deployment target, public preview) and Claude Platform on AWS (Anthropic-operated inference, AWS Marketplace billing/IAM).
Runtime comparison — Cloudflare Durable Objects, WebSocket Hibernation, and the Agents SDK (per-agent state, hibernation, no duration charge while idle); AWS Bedrock AgentCore Runtime (managed serverless, per-session microVM, sessions up to 8h) and AgentCore pricing (active consumption per second, I/O wait free).

The Anthropic, AgentCore Runtime, Managed Agents, and Workers AI figures were verified against the providers' own pricing pages on 16 Jun 2026; the Lambda MicroVMs and Microsoft Foundry facts were verified against AWS and Anthropic docs on 25 Jun 2026; the Cloudflare Durable Objects / Agents SDK and AWS AgentCore Runtime capabilities were verified against the providers' docs on 26 Jun 2026; the Claude Platform on AWS facts were verified against Anthropic's docs on 15 Jul 2026. Pricing is volatile — re-check the provider pages before relying on these. Corrections: hello@aiarch.dev.

Learn to deploy on all three — and justify the choice.

aiArch teaches the Anthropic, AWS, and Cloudflare deployment surfaces as one cross-platform architect competency — including the cost-modeling and routing that actually move the bill.

Try a sample lesson free → Browse the curriculum

See how aiArch helps senior engineers become AI-native, or compare Professional Membership pricing.

Free sample — no signup · every claim cited · full curriculum is waitlist-only