The LLM control plane
What is an AI gateway? The LLM control plane, explained
An AI gateway is a proxy — a control plane — that sits between your application and one or more model providers. It gives you a single unified API plus the operational controls every production LLM app needs: routing and fallback between models, response caching, observability and logging, cost control, and rate limiting. "AI gateway" and "LLM gateway" are the same thing. Instead of each service calling OpenAI, Anthropic, or Google directly, calls flow through the gateway, where those controls are applied in one place.
For agents and anything in production, this is the layer that turns scattered direct API calls into something you can route, observe, and budget. Below: what it does, why you need one, how it differs from an LLM router, build vs buy, and the main options compared neutrally.
What an AI gateway does
An AI gateway is one HTTP endpoint your code talks to instead of talking to each provider. It speaks a normalized API — usually OpenAI-compatible chat-completions — and translates to whichever provider actually serves the request. Around that single entry point it layers the controls you would otherwise reimplement in every service:
- Unified API — one request shape and one set of credentials across many providers, so swapping a model is a config change, not a code change.
- Routing and fallback — send a request to a chosen model, and fail over to another when a provider errors, rate-limits, or times out.
- Caching — return a stored response for an identical request instead of paying for the call again.
- Observability — logs, traces, token counts, latency, and cost per request, in one dashboard rather than scattered across provider consoles.
- Cost control — spend tracking, budgets, and limits so a runaway loop can't quietly run up a five-figure bill.
- Rate limiting — cap request volume per key, user, or model to protect both your budget and the upstream provider quota.
None of these are model capabilities. They are operational capabilities — the same plumbing an API gateway gives a microservice, specialized for LLM traffic. That is why the gateway is best understood as the operational plane of an agentic AI architecture, factored out of the application.
Why you need one
A single script calling one model directly does not need a gateway. A production system — and especially an agent that calls a model in a loop — almost always does, for five reasons that map onto the five controls above.
- Routing — real systems use more than one model: a cheap, fast one for classification and a stronger one for hard reasoning. A gateway makes "which model" a routing decision rather than branching code in every call site. See how to choose between Claude models.
- Fallback — providers have outages and rate limits. Without fallback, a provider's bad afternoon is your outage; with it, traffic shifts to a backup model and the user never notices.
- Caching — agent and RAG workloads repeat near-identical calls constantly. Caching cuts both latency and spend on the repeats, which at scale is most of them.
- Observability — when an agent misbehaves you need the actual prompts, tool calls, tokens, and cost per step. A gateway gives you that trace centrally instead of bolting logging onto every service.
- Cost control — an unbounded loop calling a frontier model every turn is expensive by default. Budgets and spend limits at the gateway are a backstop the application can't accidentally remove. See LLM cost optimization.
The through-line: a gateway moves cross-cutting concerns out of application code and into one enforceable layer. That is exactly the boundary you want when you run agents — see where to run Claude agents for how this fits the deployment decision.
AI gateway vs LLM router
The terms overlap, and vendors use them loosely, so it helps to separate the function from the product.
An LLM router is a function: given a request, decide which model or provider should handle it — by cost, latency, capability, or current availability. Routing is one feature.
An AI gateway is a product: a proxy that includes routing but adds caching, observability, cost control, rate limiting, and a unified API around it. Routing is the part of a gateway that picks the destination; the gateway is everything else that wraps the call.
So every gateway contains a router, but a standalone router is not a gateway. If all you need is "pick the cheapest capable model per request," a router is enough. The moment you also need to log, cache, budget, and fail over, you are describing a gateway.
Build vs buy
You can build a gateway: it is, fundamentally, a proxy with middleware. For one team and one provider, a thin internal wrapper around the provider SDK is often the right, simple call — don't add a dependency you don't need.
Buy (or adopt an open-source proxy) when the operational surface grows: multiple providers, fallback, central observability, per-user budgets, caching at the edge. Reimplementing all of that well is real work, and it is undifferentiated — it is plumbing, not your product. The honest framing is a boundary decision: keep the gateway behind a thin internal interface so the choice (DIY proxy, managed service, or self-hosted open source) stays a one-file swap rather than a rewrite. That isolation is itself an architecture skill, not a detail.
A middle path many teams take: a managed or open-source gateway in front, but accessed through your own small client wrapper, so you get the features without hard-coupling every service to one vendor's SDK.
The main AI gateways
Four options cover most of the field: two fully managed services, one managed model aggregator, and one self-hosted open-source proxy. They converge on the same feature set; the real differences are where they run, whether they are open source, and how they bill. Verify exact behavior against each vendor's live docs before building — these move fast.
| Gateway | Model | Unified API | Routing / fallback | Caching | Observability | Cost control | Open source |
|---|---|---|---|---|---|---|---|
| Cloudflare AI Gateway | Managed (edge) | Yes | Yes (dynamic routing, retries) | Yes | Yes | Yes (spend limits, rate limiting) | No |
| Vercel AI Gateway | Managed | Yes | Yes (load-balance, failover) | Not a headline feature | Yes | Yes (per-key budgets) | No |
| OpenRouter | Managed (aggregator) | Yes | Yes (provider routing, fallback) | Yes (prompt caching) | Yes (activity logs) | Yes (prepaid credits) | No |
| LiteLLM | Self-hosted | Yes (100+ providers) | Yes (router, retries, fallback) | Yes | Yes (logging callbacks) | Yes (virtual-key budgets) | Yes (MIT) |
Read the table as "what category each tool is in," not a scoreboard — feature parity is high, and the right pick depends on where your app runs and how you want to bill. As a disclosure, not a recommendation: this site itself runs on Cloudflare AI Gateway routing through to OpenRouter, so the cross-platform notes here are partly first-hand. The comparison stays neutral; pick by fit.
Frequently asked questions
What is an AI gateway?
An AI gateway is a proxy — a control plane — between your application and one or more model providers. It exposes a single unified API and applies routing, fallback, caching, observability, cost control, and rate limiting to every request in one place, instead of each service calling providers directly. "AI gateway" and "LLM gateway" refer to the same thing.
What is the difference between an AI gateway and an LLM gateway?
There is no meaningful difference — they are two names for the same component. "LLM gateway" emphasizes that the traffic is language-model calls; "AI gateway" is the broader, more common label and is what most vendors now use. Both describe a proxy that unifies provider access and adds operational controls.
Why use an AI gateway?
To move cross-cutting concerns out of application code into one enforceable layer. A gateway gives you routing across multiple models, fallback when a provider fails, caching to cut cost and latency on repeated calls, central observability of prompts and spend, and budgets that cap runaway cost. For agents and any production workload, these stop being optional.
What is an LLM router?
An LLM router is the function that decides which model or provider should handle a given request — by cost, latency, capability, or availability. Routing is one feature of a gateway. Every gateway contains a router, but a standalone router that only picks a destination is not a full gateway, because it lacks the caching, observability, and cost controls that wrap the call.
Do I need an AI gateway?
Not for a single script calling one model — a thin wrapper is simpler, and you should not add a dependency you don't need. You do need one once you have multiple providers or models, want fallback and central observability, run agents in a loop, or need per-user budgets and caching. The trigger is operational surface area, not project size on its own.
What are the best AI gateways?
It depends on where your app runs and how you want to bill, not on a single winner — feature parity across the main options is high. Cloudflare AI Gateway and Vercel AI Gateway are managed services; OpenRouter is a managed aggregator with unified billing across many providers; LiteLLM is a self-hosted, MIT-licensed open-source proxy. Verify current features against each vendor's live docs before choosing.
- Cloudflare AI Gateway — features (caching, rate limiting, dynamic routing, observability), unified REST API, and spend limits:
developers.cloudflare.com/ai-gateway/features/and the AI Gateway changelog (REST API 2026-05-21; spend limits 2026-06-05). - Vercel AI Gateway — unified API, load-balancing/failover, observability, per-key budgets, BYOK:
vercel.com/docs/ai-gateway. - OpenRouter — provider routing and fallback, prompt caching, credit billing:
openrouter.ai/docs(provider routing under/docs/guides/routing/provider-selection). - LiteLLM — open-source (MIT) OpenAI-compatible proxy: router, fallback, budgets, logging callbacks, caching:
docs.litellm.ai/docs/simple_proxy.
Feature existence verified against vendor docs on 26 Jun 2026; depth, limits, and pricing change — confirm against live docs before building. Disclosure: aiarch.dev runs on Cloudflare AI Gateway → OpenRouter. Corrections: hello@aiarch.dev.
Learn to design the LLM control plane — by building one.
AI Architect Academy teaches the operational plane of production AI systems — routing, caching, observability, and cost control — as first-class skills, on a platform that is itself a production agentic system running through a real AI gateway. The build is the curriculum.
Free sample — no signup · every claim cited · cancel anytime
Or get notified when new tracks ship.