Architecture notes · build-in-public

Why the coach runs a bounded agentic loop

By Wibo · Amsterdam Published 13 Jun 2026 ~6 min read

In short

The course's coach is a real agentic loop, not a chat wrapper: send a turn, check the model's stop_reason, run any requested tool, feed the result back, repeat, stop. The loop is bounded by an explicit turn budget and tool-call budget with an escalation path when either is hit.

That bound is the same safety control the curriculum teaches in the agentic-loop module — OWASP LLM06, "Excessive Agency" — applied to our own coach. The build is the curriculum, so the safeguard the lesson preaches is the safeguard the code enforces.

The decision: build the loop, then bound it

The design spec set this direction before any code was written: build the coach "as a real agentic loop, not a chat wrapper," with the loop shape send → check stop_reason → if tool_use: run tool + append result → loop → end_turn, and — this is the load-bearing instruction — "Bound it: turn/tool budget + done-condition + escalation (this is the module's lesson, applied)." That sentence is in the design doc's coach section, and the coach module it refers to (trackB.m2.agentic-loop) is the canonical reference content the rest of the catalog is authored against.

So the loop wasn't bounded as an afterthought once it misbehaved. The bound was a design requirement, because the whole product premise is that you learn the discipline by building the thing that needs the discipline.

What the bound actually is

Two hard caps live at the top of the coach loop, as named constants, with the rationale in the comment beside them:

// Bounded agency (OWASP LLM06): hard caps on model turns and tool calls so the
// loop can never run away. These are the safeguard the module itself teaches.
export const MAX_TURNS = 6;
export const MAX_TOOL_CALLS = 8;

The loop runs while (turns < MAX_TURNS). On each turn it streams a model response, collects any tool_use blocks, and inspects the stop reason. If the model stopped to call tools, the loop dispatches each one — but first checks the running tool count against MAX_TOOL_CALLS. Cross that line and the loop emits an explicit annotation and stops rather than continuing:

// inside the tool-dispatch path, before running a tool:
if (toolCalls >= MAX_TOOL_CALLS) {
  yield { type: 'annotation',
    text: 'Tool-call budget reached — escalating to a human/stopping ' +
          '(bounded agency, OWASP LLM06).' };
  yield { type: 'done', stop_reason: 'tool_budget' };
  return;
}

If instead the model stops on its own (end_turn), the loop yields done and returns — the normal exit. And if the turn budget runs out before the model is satisfied, the loop emits a max_turns annotation and stops "to avoid a runaway loop." Three exits, all explicit: the model is done, the tool budget is spent, or the turn budget is spent. There is no fourth path where the loop just keeps going.

Why a chat wrapper wouldn't have taught this

A naive "stream the model and render tokens" coach never has to confront stop_reason, never assembles the assistant turn (text blocks first, then tool-use blocks) the way the API expects, and never has to decide what "done" means or what happens when the model won't stop. Writing those branches by hand is where the agentic-loop lesson actually lands.

The tools are least-privilege on purpose

Bounding the loop is only half of "excessive agency." The other half is what the agent is allowed to touch. The coach's tools are deliberately read-and-grade only — there is no tool that lets it write content, change billing, or mutate anything it shouldn't. The tool surface, as documented at the top of the coach module, is: fetch the learner's due review items, retrieve lesson context for grounding, grade an attempt the learner actually made, schedule a review, and read mastery state. The source comment states the constraint directly: "LEAST PRIVILEGE — coach has NO write access to content/billing."

Grading is the sharp edge here. The coach does not get a private path to answer keys: when it grades an attempt, it calls the same shared grader the public attempt endpoint uses (src/lib/grade.ts). There's one grader, and answer keys are only ever revealed after a learner has answered — a rule the system prompt also enforces in words ("never hand over an exam answer key verbatim").

One subtlety the loop has to respect

A bounded loop that stops on a budget can produce a turn that is only an annotation. The repo's worker-code notes flag the consequence: the coach route must always persist a non-empty assistant turn, because an annotation-only budget turn would otherwise leave two consecutive user roles in the stored history — which the live Anthropic API rejects with a 400 on the next call. So the bound isn't just a safety feature in the abstract; it forced a concrete invariant on how conversation history is recorded. That's the kind of detail you only find by building the loop for real.

Where the state lives

The conversation itself is stored in a SQLite-backed Durable Object, one per learner session (CoachSession). Deliberately, the loop runs in the route, not inside the Durable Object — the DO is "purely the durable conversation store" so the server-sent-events streaming path stays simple. Choosing a SQLite Durable Object also kept this on Cloudflare's free tier, which mattered for shipping the slice before paying for Vectorize. Separating "where the loop runs" from "where the history lives" is a small architecture call, but it's why the streaming code didn't have to reach into storage internals mid-stream.

The honest part: it's evaluated, not asserted

A bounded loop that's never tested is a comment, not a guarantee. The loop ships with a dependency-injection seam — the stream function can be swapped — so a test can inject a model that always asks for another tool and prove the loop still terminates within its budget. That termination test, plus the offline round-trip, is what turns "the loop is bounded" from intention into something CI checks. The coach also carries a small eval set (corrective, grounded/cited, misconception-targeting, in-scope, terminates) that runs on every prompt change, because evals are both the safeguard and the single most hireable skill the course is trying to teach.

Learn the loop by building the loop.

The bounded agentic loop, least-privilege tools, and evals are the core of the course — taught across Anthropic, AWS, and Cloudflare, the same stack this coach runs on.

Browse the curriculum → Try a sample lesson

Provenance — drawn entirely from this repo

src/lib/coach.ts — runCoachLoop, MAX_TURNS = 6 / MAX_TOOL_CALLS = 8, the OWASP LLM06 comment, the three loop exits (end_turn / tool_budget / max_turns), the least-privilege tool list, and the SYSTEM_PROMPT integrity guardrails.
src/coach/CoachSession.ts — the SQLite Durable Object session store, "purely the durable conversation store," loop runs in the route.
src/lib/grade.ts — the single shared grader reused by the attempt route and the coach's grade_attempt tool.
docs/DESIGN.md §4 — "real agentic loop, not a chat wrapper," the loop shape, "Bound it: turn/tool budget + done-condition + escalation," least-privilege tools, evals.
src/CLAUDE.md — the "always persist a non-empty assistant turn" invariant (two consecutive user roles → live Anthropic 400).
Commits: 2610fbe "feat(coach): bounded agentic loop with tools + guardrails"; 7a49e64 "test(coach): cover MAX_TOOL_CALLS / tool_budget escalation branch"; 1110f40 "feat(coach): SQLite Durable Object session store"; 0a994cc "feat(evals): coach eval harness".

Build-in-public note, grounded entirely in this repository. Spot a mistake? hello@aiarch.dev.