AI Architect Academy

For engineers picking a model

Claude Opus vs Sonnet vs Haiku: how to choose

Short answer

Sonnet is the sensible default. It carries the best cost/capability balance for the bulk of real work, so start there. Route down to Haiku for cheap, high-volume, latency-sensitive tasks, and reach up to Opus only for the hardest reasoning and complex multi-step planning where answer quality clearly dominates cost.

Model choice is a cost-control lever, not a one-time setting. The skill is matching each task to the cheapest model that still gets it right, then routing between them inside one system.

The three models at a glance

ModelInput $/1MOutput $/1MChoose it when
Claude Haiku 4.5 $1 $5 High-volume, latency-sensitive work: simple classification, extraction, and cheap tool calls where speed and unit cost matter most
Claude Sonnet 4.6 $3 $15 The default workhorse: the best cost/capability balance for most coding, drafting, summarization, and routine agentic tasks
Claude Opus 4.8 $5 $25 The hardest reasoning and complex multi-step agentic planning, where the quality of the answer dominates the cost of producing it

Prices are per million tokens (input / output) and reflect public pricing verified mid-2026. Re-check the provider's pricing page before you rely on them.

How to choose

Don't start from "which model is best" — start from the task. Most decisions fall out of three questions:

1. How hard is the reasoning?

If the task is pattern-shaped — classify this, extract these fields, call this tool with these arguments — a small model handles it well, and you should not pay more for headroom you don't use. Haiku is built for exactly this. As the task needs more chained reasoning, sustained context, or judgment, move up to Sonnet. Reserve Opus for problems where a wrong answer is expensive and the path to a right one is genuinely hard.

2. How sensitive is it to latency and volume?

High-throughput, user-facing, or per-request work magnifies both speed and unit cost. There, the cheaper, faster model is often the better engineering choice even if a larger model is marginally more capable. Batch and background work tolerate a slower, stronger model.

3. Does a stronger model change the outcome?

This is the deciding question for Opus. If Sonnet already reaches the correct outcome reliably, Opus buys you nothing but a larger bill. Spend the extra only where it measurably moves quality on the tasks that matter — and confirm that with evals, not vibes.

Routing and cost

The strongest single lever is a routing pattern: use Opus only where it changes the outcome, and route sub-tasks down to Sonnet and Haiku. A planner step might run on Opus to decompose a hard problem, while the many smaller execution and extraction steps it spawns run on Sonnet or Haiku. You get the quality where it counts without paying Opus rates for the whole pipeline.

The cost spread makes this worth doing. On input, the ratio across the three models is 5:3:1 — Opus to Sonnet to Haiku. Put plainly, Haiku's input is about five times cheaper than Opus and three times cheaper than Sonnet; the output spread runs the same way ($25 / $15 / $5). In a multi-turn agent that re-sends a growing context every iteration, those input multiples compound fast, so pushing routine sub-tasks down a tier is one of the largest savings available to you.

The common mistake
Two errors dominate. The first is defaulting to Opus everywhere "to be safe" — you pay up to five times the input rate on work Sonnet or Haiku would have handled identically, and the bill balloons with no quality gain. The second is the mirror image: forcing Haiku onto genuinely hard reasoning to save money, then paying for it in wrong answers, retries, and debugging. Match the model to the task; don't anchor on one tier for everything.
Sources & provenance
  • Anthropic — models and pricing documentation (platform.claude.com): per-million-token input/output rates for Claude Haiku 4.5 ($1 / $5), Sonnet 4.6 ($3 / $15), and Opus 4.8 ($5 / $25).
  • Anthropic — guidance on model selection and routing (matching task difficulty to model tier; routing down for sub-tasks).

Prices were verified mid-2026 against Anthropic's pricing page; model pricing changes — verify the current rates before relying on them. Corrections: hello@aiarch.dev.

Learn to model cost and route models like an architect.

AI Architect Academy teaches model selection, routing, prompt caching, and cost-modeling as first-class skills — the levers that actually move the bill in production Claude systems.