For engineers designing agent systems
Agentic AI design patterns (and when to use each)
Most problems people call "agent" problems are solved by a simple workflow pattern, not a fully autonomous agent. Anthropic's Building Effective Agents draws the line clearly: workflows orchestrate LLMs and tools through predefined code paths, while agents let the model direct its own process and tool use dynamically.
The discipline is to start with the simplest thing that works and add agency only when the task genuinely needs it. More autonomy buys flexibility at the cost of latency, spend, and new failure modes. This guide covers the building block, the workflow vs agent distinction, the five workflow patterns, and the one case where you actually reach for an autonomous loop.
The augmented LLM
Every pattern is built from one unit: the augmented LLM, a model extended with retrieval, tools, and memory. Retrieval pulls in relevant context, tools let the model take actions or read external state, and memory carries information across steps. On its own this is already enough for many tasks. The patterns below are different ways of composing one or more augmented LLMs, so it pays to get this layer right first: clear tool definitions, scoped access, and a documented interface the model can actually use.
Workflows vs agents
The single most useful distinction is between workflows and agents. In a workflow, LLMs and tools are orchestrated through predefined code paths that you, the engineer, lay out in advance. The control flow is yours; the model fills in the steps. In an agent, the LLM directs its own process: it decides which tools to call and in what order, and it keeps going until it judges the task done. The control flow belongs to the model.
Workflows give you predictability, lower cost, and easier debugging. Agents give you flexibility for open-ended tasks where you cannot enumerate the steps ahead of time. Start with a workflow. Only move to an agent when the task's path genuinely cannot be predefined.
The five patterns
1. Prompt chaining
What it is: decompose a task into a fixed sequence of steps, where each LLM call works on the output of the previous one. You can add programmatic checks ("gates") between steps to catch errors early.
When to use: the task splits cleanly into subtasks that always run in the same order, such as draft then translate, or outline then expand.
Tradeoff: each added step trades latency for accuracy. If the steps do not have a fixed order, chaining is the wrong shape.
2. Routing
What it is: classify the input, then send it down a specialized path. A first LLM call (or classifier) decides the category; each category has its own prompt or tool set tuned for that kind of work.
When to use: inputs fall into distinct classes that are better handled separately, such as routing support tickets, or sending easy queries to a small model and hard ones to a large model.
Tradeoff: a wrong classification sends the input down the wrong path, so the router's accuracy caps the whole system.
3. Parallelization
What it is: run multiple LLM calls at once and aggregate. Two flavors: sectioning splits a task into independent subtasks run in parallel, and voting runs the same task several times to get diverse outputs you then combine.
When to use: subtasks are independent and benefit from parallel speed, or you want multiple attempts for confidence (voting), or separate concerns handled by focused calls (sectioning, e.g. one call for content and one for a safety check).
Tradeoff: more calls mean more spend, and you need a sound aggregation rule to combine results.
4. Orchestrator-workers
What it is: a lead LLM dynamically breaks the task into subtasks, delegates each to a worker LLM, and synthesizes their results. Unlike parallelization, the subtasks are not fixed in advance; the orchestrator decides them based on the input.
When to use: complex tasks where you cannot predict the subtasks ahead of time, such as making coordinated edits across many files, or research that fans out into varying numbers of subquestions.
Tradeoff: the dynamic decomposition adds cost and a coordination layer that can itself fail; it is more involved than a fixed parallel split.
5. Evaluator-optimizer
What it is: one LLM generates a response while a second LLM evaluates it and gives feedback, in a loop, until the output meets the bar or a limit is reached.
When to use: you have clear evaluation criteria and iterative refinement adds real value, such as literary translation or research where a critique loop measurably improves quality.
Tradeoff: the loop multiplies calls and latency; it only pays off when the evaluator's feedback is reliable and the criteria are explicit.
When you actually need an autonomous agent
An autonomous agent is the right tool only when the path cannot be predefined and the number of steps is genuinely open-ended. The agent runs a loop: act (call a tool or take a step), observe the tool results from the environment, then decide the next step, repeating until a stop condition is met. Critically, that loop must be bounded. Set explicit budgets on turns and tool calls, define a clear done-condition, and give the agent ground truth from the environment (tool outputs, test results) so it can self-correct rather than drift.
Agents shine on hard, open-ended problems where you trust the model's decision-making and can verify outcomes, the classic example being a coding agent that edits files, runs tests, and iterates. They are powerful precisely because they decide their own steps, which is also why they are harder to predict, cost more per task, and need guardrails you would not bother with for a workflow.
| Pattern | What it does | Use when |
|---|---|---|
| Prompt chaining | Decomposes a task into a fixed sequence of LLM calls, each building on the last, with optional gates between steps | The task always runs as the same ordered subtasks |
| Routing | Classifies the input and sends it to a specialized path or model | Inputs fall into distinct classes best handled separately |
| Parallelization | Runs calls concurrently and aggregates: sectioning (independent subtasks) or voting (repeated attempts) | Subtasks are independent, or you want multiple attempts for confidence |
| Orchestrator-workers | A lead LLM dynamically splits the task, delegates to workers, and synthesizes results | You cannot predict the subtasks in advance |
| Evaluator-optimizer | One LLM generates, another critiques, looping until the output meets the bar | You have clear criteria and iteration measurably improves quality |
- Anthropic — Building Effective Agents (the workflow vs agent distinction, the augmented LLM, and the five workflow patterns).
- Course material: AI Architect Academy Track B (Agentic Systems) — the bounded agentic loop, budgets, and stop conditions.
This is a conceptual overview; specific API shapes change — verify against current provider docs before implementing. Corrections: hello@aiarch.dev.
Learn to design agent systems that ship.
AI Architect Academy teaches the workflow patterns, the bounded agentic loop, evals, cost-modeling, and safety as first-class skills — across Anthropic, AWS, and Cloudflare.
Get notified when new tracks ship.