AI Architect Academy

The product guide

Cloudflare AI Gateway: features, pricing, and setup

Short answer

Cloudflare AI Gateway is a managed proxy that sits at Cloudflare's edge between your application and AI providers, adding caching, rate limiting, observability, dynamic routing, guardrails, and cost control to every LLM call behind one unified API. The core features — analytics, caching, and rate limiting — are free; you pay only for optional extras like persistent logs beyond the plan quota, Logpush, guardrails inference, and the 5% Unified Billing fee. You point your existing OpenAI- or Anthropic-compatible client at the gateway URL and keep your code.

This is the Cloudflare product page. For the vendor-neutral concept — what an AI gateway is and how it compares to Vercel AI Gateway, OpenRouter, and LiteLLM — see what is an AI gateway. Below: what it is, its features, pricing, a quick setup, and when it fits.

What is Cloudflare AI Gateway

Cloudflare AI Gateway is one of the products in Cloudflare's AI platform (alongside Workers AI, Vectorize, and AI Search). It is an AI gateway — an operational proxy in front of model providers — that runs on Cloudflare's global edge network, so the control plane sits close to your users rather than in a single region. You send requests to a gateway endpoint instead of calling Anthropic, OpenAI, or Google directly, and Cloudflare applies caching, logging, rate limits, routing, and guardrails on the way through.

Since the May 2026 REST API release, the gateway exposes a single set of endpoints on api.cloudflare.com that work across providers: a universal /ai/run, an OpenAI-compatible /ai/v1/chat/completions, an OpenAI Responses-compatible /ai/v1/responses, and an Anthropic-compatible /ai/v1/messages. That means the SDK you already use keeps working — you change the base URL, not the request shape. A default gateway is created automatically on your first request, so there is no mandatory setup step before you can route a call.

Disclosure, not a recommendation: this site runs on Cloudflare AI Gateway, routing through to OpenRouter for the coach, so the notes here are partly first-hand. The comparison stays neutral; the right gateway depends on where your app runs.

Cloudflare AI Gateway features

The feature set is what you would expect of a production AI gateway, with a few Cloudflare-specific additions (DLP, BYOK key storage, GraphQL analytics). Verify exact behaviour and limits against the live docs before building — these move fast.

FeatureWhat it doesNotes
Unified APIOne OpenAI- or Anthropic-compatible endpoint across many providers; switch model with a config change.REST API on api.cloudflare.com
CachingServes identical requests from Cloudflare's edge cache, cutting latency and repeat spend.Text/image responses; exact-match for now
Rate limitingCaps request volume with sliding- or fixed-window limits to protect budget and upstream quota.Per gateway
Dynamic routingBuilds request flows by user segment, geography, content, or A/B test; routes to the chosen model.Includes fallback across providers
GuardrailsReal-time content moderation that detects and blocks harmful content in prompts and responses.Billed as Workers AI inference
AnalyticsTracks requests, tokens, cost, errors, and latency across providers; GraphQL API access.Free, in the dashboard
LoggingPer-request logs of prompt, response, tokens, cost, duration, and status, with configurable retention.Export via Logpush
Spend limitsCost-based budgets tracking cumulative dollar spend by model, provider, or metadata.Backstop against runaway loops
DLP & BYOKScans for PII/sensitive data; stores encrypted provider keys so apps don't hold them.DLP free with limited profiles

Two of these matter most for agents. Caching and spend limits are the difference between a loop you can run in production and one that quietly runs up a five-figure bill — the same cost discipline covered in LLM cost optimization. Observability (analytics plus per-request logs) is how you debug an agent that misbehaves, because you get the actual prompts, tool calls, and cost per step in one place.

Cloudflare AI Gateway pricing

The headline is that the gateway itself is mostly free: there is no per-call gateway fee on top of your Cloudflare plan, and the core features — dashboard analytics, caching, and rate limiting — cost nothing. What you pay for is a handful of optional extras:

  • Persistent logs — free up to a plan quota. The Workers Free tier includes 100,000 stored logs across all gateways; Workers Paid raises this to 10,000,000 logs per gateway. When the storage limit is hit, new logs stop being saved until you delete old ones.
  • Logpush — streaming logs to your own storage is on the paid plan: 10 million requests per month included, then $0.05 per additional million.
  • Guardrails — content moderation is billed as Workers AI token-based inference, so cost scales with the length of the prompts and responses being evaluated.
  • Unified Billing — if you let Cloudflare bill third-party model usage (so you don't manage separate provider keys), provider inference rates pass through with no markup, plus a flat 5% fee on credits purchased.

So the practical model is: turn on the gateway for free, and your only AI Gateway-specific spend is logs beyond the quota, Logpush overage, guardrails inference, and the optional 5% billing fee. Provider token costs are separate and unchanged — the gateway is a control plane, not a model. Confirm current figures on the pricing page before you commit; these are the rates verified for this guide.

How to set it up (overview)

Setup is deliberately light — there is no infrastructure to provision. The flow:

  • 1. Get a gateway endpoint. Use the auto-created default gateway, or create a named one in the dashboard (AI → AI Gateway) when you want isolated logs, limits, or routing config.
  • 2. Repoint your client. Change your SDK's base URL to the gateway endpoint and keep the same request shape — the OpenAI-compatible /ai/v1/chat/completions or the Anthropic-compatible /ai/v1/messages. Specify the model as {provider}/{model}, e.g. anthropic/claude-sonnet-4-5.
  • 3. Authenticate. Pass a Cloudflare API token (and, for an authenticated gateway, the cf-aig-authorization header) so logs can't be inflated by unauthorized traffic.
  • 4. Send a request and watch the dashboard. Logs, analytics, and cost appear immediately; caching, rate limiting, and guardrails are then configured per gateway or per request via headers.

Because the gateway is provider-neutral, the same isolation discipline applies as with any AI gateway: keep it behind a thin internal client so swapping the gateway — or the provider behind it — is a one-file change, not a rewrite. That boundary is the architecture skill, covered in where to run Claude agents.

When Cloudflare AI Gateway fits

It is the natural pick in two situations. First, if you already run on Cloudflare — Workers, Durable Objects, AI Search — the gateway is part of the same platform, the analytics live next to your other dashboards, and edge caching is a genuine latency win for a global audience. Second, if you want a provider-neutral proxy with a generous free tier: you can route across more than a dozen providers (Anthropic, OpenAI, Google, Groq, Mistral, xAI, DeepSeek, Workers AI, and more) through one API without paying a per-call gateway fee.

It is a weaker fit if your stack is centred elsewhere and you want billing and routing native to that platform — for example, a Vercel-hosted app may prefer Vercel AI Gateway, or a team standardizing on an aggregator may prefer OpenRouter's unified credits. Feature parity across the main gateways is high; the deciding factors are where your app runs, how you want to bill, and whether you need edge caching. The neutral side-by-side is on the AI gateway concept page, and how the gateway slots into a full system is in agentic AI architecture.

Frequently asked questions

What is Cloudflare AI Gateway?

Cloudflare AI Gateway is a managed proxy that runs on Cloudflare's edge network between your application and AI model providers. It adds caching, rate limiting, observability, dynamic routing, guardrails, and cost control to every LLM call, exposed through a single OpenAI- or Anthropic-compatible API. It is the operational plane for LLM traffic, factored out of your application code.

Is Cloudflare AI Gateway free?

Largely yes. The core features — dashboard analytics, caching, and rate limiting — are free with no per-call gateway fee beyond your Cloudflare plan. You pay only for optional extras: persistent logs above the plan quota (100,000 logs on Workers Free, 10 million per gateway on Workers Paid), Logpush overage beyond 10 million requests at $0.05 per million, guardrails billed as Workers AI inference, and a 5% fee on credits if you use Unified Billing. Provider token costs are separate.

What providers does Cloudflare AI Gateway support?

More than a dozen, through one unified API: Anthropic, OpenAI, Google AI Studio and Vertex AI, Groq, Mistral, Cohere, Perplexity, xAI, DeepSeek, Cerebras, Baseten, Parallel, and Cloudflare's own Workers AI. You select a model as {provider}/{model}, so switching provider is a parameter change rather than a code change.

How do you set up Cloudflare AI Gateway?

Use the default gateway that Cloudflare creates automatically on your first request, or create a named gateway in the dashboard. Then point your SDK's base URL at the gateway endpoint, keep your existing OpenAI- or Anthropic-compatible request shape, specify the model as {provider}/{model}, and authenticate with a Cloudflare API token. Logs and analytics appear immediately; caching, rate limits, and guardrails are configured per gateway or per request.

Does Cloudflare AI Gateway cache responses?

Yes. It can serve identical requests directly from Cloudflare's edge cache instead of paying for another provider call, which cuts both latency and spend on repeats. Caching currently covers text and image responses and applies to exact-match requests; Cloudflare has said semantic caching is planned to improve hit rates. Caching is a free core feature.

Cloudflare AI Gateway vs Vercel AI Gateway?

Both are managed AI gateways with a unified API, routing, fallback, observability, and per-key or per-budget cost control, and feature parity is high. The practical differences: Cloudflare runs on its global edge with free caching and rate limiting and is the natural fit if you already use Cloudflare; Vercel AI Gateway is tightly integrated with Vercel deployments and the AI SDK. Choose by where your app runs and how you want to bill — see the neutral comparison for the full field.

Sources & provenance
  • Cloudflare AI Gateway features (caching, rate limiting, dynamic routing, guardrails, DLP, analytics, logging, spend limits, BYOK, custom costs): developers.cloudflare.com/ai-gateway/features/.
  • Unified REST API + supported providers (OpenAI/Anthropic-compatible endpoints; default gateway auto-created): developers.cloudflare.com/ai-gateway/usage/chat-completion/ and the AI Gateway changelog (REST API 2026-05-21; auto-default 2026-03-02).
  • Pricing and limits (core features free; log quotas 100k free / 10M paid per gateway; Logpush 10M + $0.05/M; guardrails as Workers AI inference; 5% Unified Billing fee): developers.cloudflare.com/ai-gateway/reference/pricing/ and /reference/limits/.
  • Caching scope (text/image, exact-match, semantic planned): developers.cloudflare.com/ai-gateway/features/caching/.

Feature existence and figures verified against Cloudflare docs on 26 Jun 2026; limits and pricing change — confirm against live docs before building. Disclosure: aiarch.dev runs on Cloudflare AI Gateway → OpenRouter. Corrections: hello@aiarch.dev.

Learn to run the LLM control plane — by building on one.

AI Architect Academy teaches the operational plane of production AI systems — routing, caching, observability, and cost control — as first-class skills, on a platform that is itself a production agentic system running through Cloudflare AI Gateway. The build is the curriculum.

Free sample — no signup · every claim cited · cancel anytime

Or get notified when new tracks ship.