The reference architecture

Agentic AI architecture: the components and layers of a production agent system

Q: What is agentic AI architecture?

It is the structure of an agentic AI system: the components and layers that surround a model so it can pursue a goal in a loop - a reasoning core, an orchestrator, memory and state, a tools/action layer, retrieval, and an operational plane for guardrails, observability, and evals. It describes how the parts connect, not what agentic AI is or which pattern to use.

By Wibo · Amsterdam Published 26 Jun 2026 Last updated 26 Jun 2026 ~10 min read

Short answer

An agentic AI architecture is the set of components a system uses to let a model pursue a goal in a loop: a reasoning core (the model and its planning), an orchestrator that controls flow, memory and state, a tools/action layer, a retrieval/knowledge layer, and an operational plane for guardrails, observability, and evals. The model decides; the architecture is everything around it that makes those decisions safe, grounded, affordable, and operable in production.

This page is the architecture — the static structure and how the parts connect. For which control-flow pattern to compose them into, see agentic AI design patterns; for what agentic AI is, see what is agentic AI.

Agentic AI architecture at a glance

Strip away vendor branding and almost every production agent system is built from the same six layers. The reasoning core sits in the middle; the other five exist to ground it, bound it, and run it.

Layer / component	What it does	Failure it prevents
Reasoning core	The model plans, decides, and emits tool calls toward a goal.	—
Orchestration	Runs the loop, routes between steps and sub-agents, decides when to stop.	Runaway loops, unbounded cost
Memory & state	Holds working context now and durable facts across runs.	Amnesia, lost progress
Tools / action layer	Lets the agent act — APIs, functions, MCP servers — under least privilege.	Inability to act; over-broad access
Retrieval / knowledge	Grounds answers in authoritative data (agentic RAG).	Hallucination, stale answers
Operational plane	Guardrails, observability, cost control, and evals around the whole system.	Unsafe actions, silent regressions

The rest of this page takes each layer in turn, then maps all six onto Anthropic, AWS, and Cloudflare — the part no vendor explainer covers, because each only maps to its own stack.

The reasoning core: model, planning, and tool-calling

At the centre is the model. In an agentic system it does three things a single completion never does: it plans (breaks a goal into steps), it decides (chooses the next action), and it emits tool calls the orchestrator executes and feeds back. The architectural decision here is model selection per step — a cheap, fast model for routing and a stronger one for hard reasoning — because that choice drives both quality and cost-at-scale. See how to choose between Claude models.

The core is one component, not the whole system. Treating "the agent" as just the model is the most common architecture mistake; the next five layers are what turn a clever completion into something you can run.

Orchestration and control flow: single-agent vs multi-agent topology

The orchestrator owns the loop: send a request, check the stop reason, run any requested tool, feed the result back, repeat until done — bounded by a turn and tool-call budget. How you wire the agents is the topology:

Single agent — one loop with a set of tools. Simplest; the right default until it isn't.
Multi-agent — an orchestrator delegating to specialised sub-agents (each with its own context and tools), or agents arranged in a pipeline. Adds capability and isolation at the cost of coordination.

Topology is a structural choice — where orchestration sits and how agents connect. Which control-flow pattern to run inside it (prompt chaining, routing, orchestrator-workers, evaluator-optimizer) is a separate decision covered in agentic AI design patterns. For the loop itself in depth — and why it must be bounded — see why the coach runs a bounded agentic loop.

Memory and state

Agents need two kinds of memory, and conflating them is an architecture smell:

Short-term (working) memory — the context window: the current task, recent tool results, and scratch reasoning. Bounded by tokens, so it's a budget to manage, not free space.
Long-term (durable) memory — state that outlives a single run: facts, user history, task progress, stored in a database or vector store and retrieved when relevant.

The architectural question is what gets promoted from short-term to long-term, and how durable state is keyed and isolated per user or session. Treating the context window as long-term memory is how agents lose progress or leak context between users.

Tools and integration: the action layer (and MCP)

Tools are what let an agent do things — call an API, run a query, write a file. Architecturally, the action layer is defined by two properties: a tool contract (clear schemas the model can call reliably) and least privilege (each tool exposes the minimum capability needed, so a confused or hijacked agent can't reach further than intended).

The Model Context Protocol (MCP) standardises this layer — a common way to expose tools, resources, and prompts to any agent, so integrations are reusable rather than bespoke per project. MCP is the deep-dive on this layer; here it's enough to know the action layer is where most of the security surface lives, which is why it's governed by the operational plane below.

Retrieval and knowledge: agentic RAG

Retrieval grounds the agent in authoritative data so it answers from facts rather than from the model's parametric memory. In an agentic RAG architecture, retrieval isn't a fixed pre-step — the agent decides when to retrieve, what to query, and whether the results are sufficient, sometimes searching iteratively. That makes retrieval a tool the orchestrator can invoke, not a pipeline stage bolted on front.

The architectural decisions: what is the authoritative source, how is freshness and provenance handled, and where do embeddings and the vector index live relative to the agent. Provenance matters as much as recall — an answer you can't trace is one you can't ship.

Guardrails, observability, and evals: the operational plane

The operational plane wraps every other layer and is what separates a demo from a production system:

Guardrails — bounded loops (turn and tool-call budgets with escalation), least-privilege tools, input/output filtering, and human-in-the-loop on high-risk actions. This is the OWASP-LLM threat surface, designed for rather than patched on.
Observability — tracing each loop iteration, tool call, token spend, and latency, so failures are diagnosable.
Cost control — budgets and routing, because a loop that calls a strong model every turn is expensive by default. See LLM cost optimization.
Evals — the correctness harness. Because the same input can yield different outputs, you pin quality with evals and LLM-as-judge, not manual spot-checks. See how to evaluate an LLM agent.

A cross-platform reference: mapping the layers to Anthropic, AWS, and Cloudflare

The same six layers map onto each major platform's primitives. Knowing the mapping is what lets an architect justify a platform choice instead of defaulting to one — and it's the view no single-vendor explainer gives you.

Layer	Anthropic	AWS	Cloudflare
Reasoning core	Claude (Messages API, tool use)	Bedrock foundation models	Models via AI Gateway / Workers AI
Orchestration	Claude Agent SDK	Bedrock AgentCore	Workers + Durable Objects / Agents SDK
Memory & state	App-managed context	Agent memory / DynamoDB	Durable Objects (SQLite)
Tools / action	Tool use + MCP	Action groups / MCP	Workers bindings + MCP
Retrieval	App-side RAG	Bedrock Knowledge Bases	AI Search / Vectorize + Workers AI embeddings
Operational plane	Usage + tool limits	CloudWatch / Guardrails	AI Gateway (observability, routing, caching)

This site is itself an instance of that mapping — built on Anthropic, AWS, and Cloudflare, in public. The real decisions and tradeoffs are in the architecture notes.

From architecture to patterns

Architecture tells you what the system is built from. The next decision is how to compose those components for a given problem — single call, prompt chain, routing, orchestrator-workers, or a full multi-agent flow. That's the patterns layer:

Agentic AI design patterns — which control-flow pattern to use, and when.
The bounded agentic loop — the orchestration loop in depth, with the budget-and-escalation guardrail.
The AI architect role — who owns these decisions, and how to grow into it.

Frequently asked questions

What is agentic AI architecture?

It's the structure of an agentic AI system: the components and layers that surround a model so it can pursue a goal in a loop — a reasoning core, an orchestrator, memory and state, a tools/action layer, retrieval, and an operational plane for guardrails, observability, and evals. It describes how the parts connect, not what agentic AI is or which pattern to use.

What are the components of an agentic AI system?

Six recur in almost every production system: the reasoning core (the model and its planning), orchestration (the loop and control flow), memory and state (short-term context plus durable storage), the tools/action layer (often via MCP), retrieval/knowledge (agentic RAG), and the operational plane (guardrails, observability, cost control, evals).

What is a reference architecture for agentic AI?

A reference architecture is a reusable template of those six layers and how they connect, independent of any one vendor. You instantiate it by mapping each layer to concrete primitives — for example orchestration to Bedrock AgentCore on AWS or to Durable Objects on Cloudflare — so platform choices become explicit and justifiable.

What does an agentic AI architecture diagram look like?

A clear one is layered: the reasoning core in the centre, the orchestrator driving the loop around it, memory and retrieval feeding context in, the tools/action layer reaching out to external systems, and the operational plane wrapping everything. The six-layer table near the top of this page is that diagram in tabular form.

What's the difference between single-agent and multi-agent architecture?

A single-agent architecture is one loop with one model and a set of tools — simplest and the right default. A multi-agent architecture has an orchestrator delegating to specialised sub-agents, each with its own context and tools, or agents in a pipeline. Multi-agent adds capability and isolation but costs coordination and complexity; use it when one agent's context or toolset genuinely won't stretch.

What is agentic RAG architecture?

Agentic RAG makes retrieval a decision the agent controls rather than a fixed pre-step. The agent chooses when to search, what to query, and whether results are sufficient — sometimes retrieving iteratively. Architecturally, retrieval becomes a tool the orchestrator can invoke, grounded in an authoritative source with provenance, instead of a one-shot lookup bolted onto the front of the prompt.

How do you architect agentic AI on AWS and Cloudflare?

Map the six layers to each platform's primitives. On AWS: Bedrock for the model, AgentCore for orchestration, Knowledge Bases for retrieval, CloudWatch and Guardrails for the operational plane. On Cloudflare: Workers and Durable Objects for orchestration and state, AI Search/Vectorize for retrieval, and AI Gateway for observability, routing, and caching. The model layer can be Claude in both cases.

How is an agentic AI architecture different from a design pattern?

The architecture is the static structure — the components and how they connect. A design pattern is a dynamic composition choice — which control flow to run those components in for a given task (routing, orchestrator-workers, and so on). You decide the architecture once for the system and pick patterns per problem. See the design patterns guide for the pattern-level decisions.

Sources & provenance

Component and layer model synthesized from AI Architect Academy's curriculum (Track B, agentic systems) and the platform's own build (docs/DESIGN.md, src/lib/coach.ts).
Bounded-loop and OWASP-LLM guardrail framing: Anthropic guidance on building agents and tool use (platform docs); see the build note.
Cross-platform mapping reflects current Anthropic (Claude Agent SDK, tool use), AWS (Bedrock, AgentCore, Knowledge Bases), and Cloudflare (Workers, Durable Objects, AI Search, AI Gateway) primitives — verify exact API shapes against each vendor's live docs before building.

Platform primitives and API shapes change; treat the mapping as a design template, not a guaranteed signature. Corrections: hello@aiarch.dev.

Learn to architect agentic systems by building one.

AI Architect Academy teaches each of these layers — orchestration, memory, tools, retrieval, and the operational plane — as first-class skills, on a platform that is itself a production agentic system built across Anthropic, AWS, and Cloudflare. The build is the curriculum.

Try a sample lesson free → Browse the curriculum

Free sample — no signup · every claim cited · cancel anytime