The skills map, in order

The AI engineer roadmap: a stage-by-stage skills map

Q: What is the AI engineer roadmap?

It is a sequence of nine stages that takes an experienced engineer from LLM fundamentals to a deployed, evaluated, cost-controlled AI system: foundations and probabilistic thinking, prompting and context, RAG and retrieval, agents and tool use, evals, cost and routing, safety and guardrails, deployment and ops, and a portfolio. Each stage pairs a skill to learn with an artifact to produce, and the stages build on each other in order.

Q: How long does the roadmap take?

For someone already working as an engineer, it is focused weeks rather than years, and the timeline depends mostly on how much you build rather than read. Foundations and a first working agent are a matter of days; the production stages - evals, cost, safety, deployment - are the part that takes deliberate weeks, because doing them properly is the point. Skipping stages you already know, such as deployment and ops, compresses it further.

By Wibo · Amsterdam Published 26 Jun 2026 Last updated 26 Jun 2026 ~8 min read

Short answer

The AI engineer roadmap is a sequence of nine stages — foundations and probabilistic thinking, prompting and context, RAG and retrieval, agents and tool use, evals, cost and routing, safety and guardrails, deployment and ops, and a portfolio — each with a concrete skill to learn and an artifact to produce. Learn them roughly in order: each stage assumes the one before it, and every stage ends in something runnable. You do not need a degree or machine learning; this is systems engineering on top of models that already exist.

This page is the structured map — the checklist of what to learn and what to ship at each stage. If you want the narrative version (what transfers, how long it takes, the mindset shift), read how to become an AI engineer. Use them together: that one is the story, this one is the route.

How to use this roadmap

This is a map for engineers who already ship software, not a beginner syllabus. Three rules make it work:

Skip what you already know. If you live in production observability, the deployment-and-ops stage is revision, not new ground — spend the time you save on evals, which almost nobody arrives with. The order matters more than the duration of any single stage.
Don't read your way through it — build. Every stage names an outcome: an artifact you can run and show. Reading about agents teaches you nothing the first broken tool-call won't teach you faster. Treat each stage as a thing to ship.
Carry one project the whole way. The fastest route is a single agent you grow stage by stage — add retrieval to it, add evals to it, cost-model it, harden it, deploy it. By the end the project is your portfolio, and the layers prove you understand how they fit.

The stages cluster into three arcs: foundations (stages 1–3), building agents (stage 4), and production (stages 5–8), with the portfolio (stage 9) as the proof that ties them together. The table is the whole map at a glance; the sections after it go a level deeper.

The roadmap, stage by stage

Read it top to bottom. The middle column is the skill to learn; the right column is the artifact that proves you learned it — the thing a hiring engineer can actually look at.

Stage	What to learn	What you produce
1. Foundations & probabilistic thinking	How a token-metered, non-deterministic model differs from the deterministic systems you know; prompts as spec.	A working mental model and a first scripted model call you can reason about.
2. Prompting & context	Structured prompting, context windows, system vs user roles, and prompt caching to control cost.	A reusable prompt template with the context assembly that feeds it.
3. RAG & retrieval	Embeddings, vector search, chunking, and where retrieval earns its place versus where it adds noise.	A retrieval step that grounds answers in your own data, with provenance.
4. Agents & tool use	The agentic loop, tool and MCP design, sub-agents, and how to bound a loop so it can't run away.	A tool-calling agent that does real work end to end — not a notebook demo.
5. Evals	Defining correctness for variable output; eval harnesses and LLM-as-judge; gating on a score.	An eval suite that says, with a number, whether your agent is getting better or worse.
6. Cost & routing	Token economics, model selection per step, routing cheap-vs-strong, and blended cost at volume.	A cost model and a routing decision you can justify in writing.
7. Safety & guardrails	The trust boundary: prompt injection, data exfiltration, unsafe tool use, least-privilege tools.	A documented threat model and guardrails wired into the agent, not bolted on.
8. Deployment & ops	Standing the system up on a real platform (Anthropic, AWS Bedrock, Cloudflare) and observing it.	A deployed system with logging, tracing, and cost visibility in production.
9. Portfolio	Packaging the above as evidence: a running system plus the design rationale behind it.	One shipped, explained system that proves you can engineer production AI.

This map mirrors how the AI Architect Academy curriculum is sequenced — backward-designed from the job, so the order reflects what production AI actually demands rather than what's easiest to teach first.

Foundations: stages 1–3

The opening arc is about changing how you think before you build anything ambitious. Stage 1 is the shift from deterministic to probabilistic: the same input can return different output, the core component is metered per token, and your job is to build a reliable system around an unreliable part. Internalise that and the rest of the roadmap stops feeling strange. Nothing here requires machine learning — and it stays that way the whole route; you are building on models, not training them.

Stage 2 is prompting and context done properly: not prompt-whispering tricks, but treating the prompt as a spec, controlling what goes into the context window, and using prompt caching to keep cost down. Stage 3 adds retrieval — embeddings and vector search — but the real skill is judgement about when RAG helps. A lot of systems reach for retrieval where a better prompt or a tool call would do. Get these three right and you have a grounded, controllable single-shot system; the next stage puts it in a loop.

Building agents: stage 4

This is the hinge of the whole roadmap. Stage 4 turns a single model call into an agent — a model running in a loop, deciding when to call tools, working toward a goal instead of answering one prompt. The skills are tool and MCP design (least-privilege, well-described tools the model can actually use), orchestration across sub-agents, and — the part beginners skip — bounding the loop so it can't burn tokens forever or take an unsafe action. Build one that does real work, with the loop and tool layer visible.

This is also where structure starts to matter: how the orchestrator, tools, retrieval, and stopping conditions fit together is an architecture decision, not an implementation detail. The patterns behind that — and why a bounded loop is the default — are laid out in agentic AI architecture. Stage 4 is where many people stop and call themselves done. The roadmap doesn't, because an agent that works in a demo is a long way from one you'd run in production.

Production: evals, cost, safety, deploy

Stages 5–8 are what separate an AI engineer from someone who got a demo working once. They're the moat, because they're the least fun and the most valued:

Stage 5 — Evals. The single most underrated skill in the field. When output varies, vibes don't tell you whether a change helped; an eval suite does. Learn to define correctness, build a harness, use LLM-as-judge where exact-match won't work, and gate changes on a score. Our guide to evaluating LLM systems is the deep dive for this stage.
Stage 6 — Cost & routing. Token economics is the new latency-and-throughput. Learn to pick a model per step — a cheap one for routing and extraction, a strong one for hard reasoning — and to compute the blended cost at production volume so the bill doesn't ambush you.
Stage 7 — Safety & guardrails. The trust boundary for a system that takes untrusted text and can act on the world: prompt injection, data exfiltration, unsafe tool use. Least-privilege tools and an explicit threat model belong in the design, not in the post-incident review.
Stage 8 — Deployment & ops. Stand the whole thing up on a real platform — Anthropic, AWS Bedrock, or Cloudflare — with the observability, tracing, and cost visibility you'd demand of any production service. The model in the loop doesn't excuse you from operating it well; it raises the bar.

Done in order, each of these adds rigour to the agent you built at stage 4 rather than sending you back to a blank page. That's the point of carrying one project: by stage 8 you have a system that's correct, affordable, safe, and operable — the four things production actually asks for.

Portfolio: prove it

Stage 9 isn't more building — it's packaging the build as evidence. AI hiring leans hard on demonstrated work because the field moves faster than credentials can keep up. The artifact that lands a role is one shipped system, well explained: a running agent with the tool layer and loop visible, the eval suite that proves it works, the cost and model-selection writeup, evidence you handled the trust boundary, and a short design rationale tying the decisions together.

One well-shipped, well-explained system beats five half-finished demos every time. If you want a worked example of what shipped-and-explained looks like, the architecture notes behind this platform are exactly that artifact — a production AI system with its decisions written down. That writeup, more than any certificate, is what tells a hiring engineer you can do the job.

Frequently asked questions

What is the AI engineer roadmap?

It's a sequence of nine stages that takes an experienced engineer from LLM fundamentals to a deployed, evaluated, cost-controlled AI system: foundations and probabilistic thinking, prompting and context, RAG and retrieval, agents and tool use, evals, cost and routing, safety and guardrails, deployment and ops, and a portfolio. Each stage pairs a skill to learn with an artifact to produce, and the stages build on each other in order.

What should I learn first?

Start with foundations: the shift from deterministic to probabilistic systems, prompts as spec, and what changes when your core component is non-deterministic and metered per token. Then prompting and context, then retrieval. Those first three stages give you a grounded, controllable single-shot system before you put a model in a loop at the agent stage. Resist jumping straight to agents — they make a lot more sense once the foundations are solid.

How long does the roadmap take?

For someone already working as an engineer, it's focused weeks rather than years, and the timeline depends mostly on how much you build rather than read. Foundations and a first working agent are a matter of days; the production stages — evals, cost, safety, deployment — are the part that takes deliberate weeks, because doing them properly is the point. Skipping stages you already know (say, deployment and ops) compresses it further.

Do I need to learn machine learning?

No. Training models and the maths behind them are a different profession (ML engineering and research). Every stage on this roadmap is about building systems on top of models that already exist — calling, orchestrating, evaluating, and operating them. The work is much closer to senior software engineering than to a research lab, which is why an experienced engineer can move quickly through it.

What should be in my portfolio?

Shipped systems, not snippets: a working agent you can show running with the tool layer and loop visible, an eval suite that defines and measures correctness, a cost and model-selection writeup, evidence you handled the trust boundary, and a short design rationale for the whole thing. The roadmap is designed so that carrying one project through every stage produces exactly this artifact by the end.

What's the fastest path for an experienced engineer?

Skip the stages you already own and carry a single project the whole way. Most seniors can move fast through foundations, prompting, and deployment because those lean on skills they already have; the time is best spent on agents, evals, and the trust boundary, which are genuinely new. Build one real agent, add each production layer to it in order, and write the rationale as you go.

Sources & provenance

The stage sequence and outcomes are synthesized from AI Architect Academy's backward-designed curriculum (docs/CURRICULUM.md, docs/PLAN.md) — built from the job backward, every claim cited.
Agentic-system practices (the bounded loop, tool and MCP design, evaluation, the trust boundary) follow Anthropic's published guidance on building effective agents and agentic system design.
This is an experience-based roadmap for engineers transitioning into AI work; directional claims about ordering, timelines, and hiring reflect the field's portfolio-and-evidence norms, not a specific survey.

Roadmaps are maps, not guarantees — order and pace vary by background and by what a given role demands. The field moves quickly; verify specifics against current sources before relying on them. Corrections: hello@aiarch.dev.

Walk the roadmap on a platform built from the job backward.

AI Architect Academy teaches every stage — foundations, agents, evals, cost, safety, and deployment — as a thing you build, mapped onto the production experience you already have, across Anthropic, AWS, and Cloudflare. No machine learning required. The build is the curriculum.

Try a sample lesson free → Browse the curriculum

Free sample — no signup · every claim cited · cancel anytime