The discipline
Context engineering: the discipline that succeeded prompt engineering
Context engineering is the discipline of deciding what information enters a model's limited context window at each step — the instructions, retrieved knowledge, memory, and tool results — and managing that budget so the model has just enough to act well, and no more. Prompt engineering was about wording one message; context engineering is about assembling the whole working set the model sees on every turn of an agent.
This page is the discipline — what enters the window and how you manage it. For the system that surrounds the model (orchestration, memory store, tools, retrieval), see agentic AI architecture, where memory is one layer of the whole.
What is context engineering?
A large language model has no memory and no access to your data beyond what you put in front of it. Everything it knows for a given step lives in its context window — a fixed token budget that holds the system instructions, the conversation so far, retrieved documents, and the results of any tools it has called. Context engineering is the practice of curating that window: choosing what goes in, in what form, in what order, and what to leave out.
The term went mainstream in mid-2025. Shopify's Tobi Lütke framed it as "the art of providing all the context for the task to be plausibly solvable by the LLM," and Andrej Karpathy sharpened it to "the delicate art and science of filling the context window with just the right information for the next step." Anthropic later formalized it as the set of strategies for curating and maintaining the optimal set of tokens during inference. The common thread: in any industrial-strength LLM app, the prompt is a small fraction of what the model actually reads.
Context engineering vs prompt engineering
Prompt engineering is not dead — it has been subsumed. Wording a single instruction well still matters, but in a production agent the static prompt is a sliver of the context; the rest is assembled at runtime. The shift is from authoring one message to designing the whole input that reaches the model on every turn.
| Dimension | Prompt engineering | Context engineering |
|---|---|---|
| Unit of work | A single message or template. | The entire working set the model sees each step. |
| When it's set | Authored once, mostly static. | Assembled dynamically at runtime, per turn. |
| What it includes | Instructions and phrasing. | Instructions + retrieval + memory + tool results. |
| Main constraint | Clarity of wording. | The token budget of the context window. |
| Where it lives | One-shot chat use. | Agentic, multi-step systems. |
| Relationship | A subset. | The superset that contains it. |
Put plainly: prompt engineering optimizes a string; context engineering optimizes a budget. Everything below is about that budget — what fills it, and how to keep it from overflowing.
What goes into the context window
On any given step, the window is a composition of distinct sources, each of which the engineer decides to include or omit:
- Instructions — the system prompt: role, constraints, output contract. The stable spine.
- Conversation / task state — what's happened so far in this run, including the user's goal and prior steps.
- Retrieved knowledge — documents pulled from an authoritative source so the model answers from facts, not parametric memory.
- Memory — durable facts recalled from earlier sessions: preferences, history, prior decisions.
- Tool definitions and results — the schemas the model can call, and the outputs it gets back to reason over.
The engineering is in the curation: too little and the model can't act; too much and it loses the thread, costs more, and degrades as the relevant token gets buried. The goal is the smallest set that makes the next step solvable.
Memory: short-term vs long-term
Two kinds of memory feed the window, and conflating them is a common mistake:
- Short-term (working) memory — the context window itself: the current task, recent tool results, and scratch reasoning. It is bounded by tokens, so it is a budget to manage, not free space. It resets when the run ends.
- Long-term (durable) memory — state that outlives a run: facts, user history, and task progress stored in a database or vector store and recalled into the window when relevant.
Agent memory is the bridge: a mechanism that decides what to persist out of short-term memory and what to retrieve back in later. The architectural questions are what gets promoted from working to durable memory, and how durable state is keyed and isolated per user or session — treating the context window itself as long-term memory is how agents lose progress or leak context between users. The store and retrieval layer live in the system architecture; this page is about what reaches the window.
Managing the context budget: compaction and retrieval
The window is finite, and longer is not better — accuracy degrades as the relevant fact competes with noise. So context engineering is mostly subtraction:
- Retrieval over stuffing — pull only the passages relevant to the current step instead of dumping a whole corpus into the prompt. In an agentic setup the model decides when to retrieve and what to query, making retrieval a tool rather than a fixed pre-step.
- Compaction — when a long run approaches the limit, summarize older turns into a compact form and carry the summary forward, freeing tokens while preserving the thread. This is how long agent sessions stay within budget.
- Selection and ordering — keep the most decision-relevant material, and place it where the model attends to it rather than burying it mid-context.
- Isolation — give sub-agents their own scoped windows so one agent's context doesn't bloat another's.
Every token in the window has a cost in money and in attention. Managing the budget is therefore also a cost-optimization problem: a context that doubles in size doubles the per-call input cost across a whole agent loop.
Context engineering for agents
Agentic systems are where context engineering becomes unavoidable. A single chat turn is forgiving; an agent runs a loop — plan, call a tool, read the result, decide again — and the context window is rebuilt on every iteration. That makes the discipline a continuous, programmatic concern rather than a one-time authoring task.
In a multi-step run the engineer designs how state accumulates: which tool results stay in the window and which get summarized away, when to recall durable memory, when to retrieve, and how to keep the window coherent over dozens of turns. This is also where tool design matters — well-scoped tools (often via the Model Context Protocol) return compact, relevant results instead of flooding the window. How you evaluate whether the context you assembled actually produced better answers is an evaluation problem; how the loop is bounded and orchestrated is covered in the architecture.
Frequently asked questions
What is context engineering?
Context engineering is the discipline of deciding what information enters a model's limited context window at each step — instructions, retrieved knowledge, memory, and tool results — and managing that token budget so the model has just enough to act well. It is the runtime assembly of the whole input the model sees, not just the wording of one message.
How is context engineering different from prompt engineering?
Prompt engineering optimizes a single message or template; context engineering optimizes the entire working set the model sees on every step, including retrieval, memory, and tool results that are assembled dynamically at runtime. Prompt engineering is a subset — the wording still matters, but in a production agent it is a small fraction of the total context.
Is prompt engineering dead?
No, but it has been subsumed. Crafting a clear instruction still matters, and it remains part of the system prompt. What changed is scope: in agentic systems the static prompt is a sliver of what the model reads, so the broader discipline of curating the whole context window has absorbed prompt engineering rather than replaced it outright.
What is agent memory?
Agent memory is the mechanism by which an agent persists information beyond a single run and recalls it later. It splits into short-term (working) memory — the context window for the current task, bounded by tokens — and long-term (durable) memory — facts, history, and progress stored externally and retrieved into the window when relevant. The bridge between them, deciding what to promote and what to recall, is the engineering problem.
What is the context window?
The context window is the fixed amount of tokens a model can read at once — its entire working memory for a given step. It holds the system instructions, the conversation so far, any retrieved documents, and tool results. Because it is finite and accuracy degrades as it fills with noise, deciding what occupies it is the core of context engineering.
How do you manage context for an agent?
Treat the window as a budget, not free space. Retrieve only the passages relevant to the current step instead of stuffing the prompt, compact older turns into summaries as a long run approaches the limit, order the most decision-relevant material where the model attends to it, and give sub-agents isolated windows. The aim is the smallest context that makes the next step solvable, which also keeps per-call cost down.
- Term history: Tobi Lütke (Shopify) and Andrej Karpathy popularized "context engineering" in June 2025 — Karpathy's "delicate art and science of filling the context window" definition is from his 25 Jun 2025 post.
- Formalization as curating the optimal set of tokens during inference: Anthropic engineering guidance on context engineering for agents (2025); see also the Prompt Engineering Guide context-engineering guide.
- Memory model, budget, compaction, and agentic-retrieval framing synthesized from AI Architect Academy's curriculum (Track B, agentic systems) and the platform's own build (
docs/DESIGN.md,src/lib/coach.ts).
The discipline and the tooling around it are evolving quickly; treat specifics as design intent and verify token limits and APIs against each vendor's live docs. Corrections: hello@aiarch.dev.
Learn to engineer context by building an agent that depends on it.
AI Architect Academy teaches retrieval, memory, and context-budget management as first-class skills, on a platform that is itself a production agentic system built across Anthropic, AWS, and Cloudflare. The build is the curriculum.
Free sample — no signup · every claim cited · cancel anytime
Or get notified when new tracks ship.