04 · Note 02 — Model Selection Matrix
Status: Outline. Body fills in Week 2. Voice: principal-level, BFSI-threaded, Apic-calibrated.
What this file is. The decision matrix for picking between Opus 4.7, Sonnet 4.6, and Haiku 4.5 — by use case, by cost shape, by latency budget, by reasoning depth. The 30-second answer for the customer call, with the depth behind it.
What this file is NOT. A benchmark recital. Not a marketing-page summary.
The current lineup (as of build)
| Model | Model ID | Best for | Cost shape | Latency shape |
|---|---|---|---|---|
| Opus 4.7 | claude-opus-4-7 |
Hardest reasoning, agentic orchestration, hard-to-eval reasoning chains | Highest | Highest |
| Sonnet 4.6 | claude-sonnet-4-6 |
Default workhorse — agent loops, RAG, structured drafting | Mid | Mid |
| Haiku 4.5 | claude-haiku-4-5-20251001 |
High-volume classify/route/extract, judge models, low-latency UX | Lowest | Lowest |
→ Knowledge cutoff for Claude in this repo: per Apic docs at the time of writing.
The decision matrix
Three axes — capability requirement × volume × latency budget. Pick the model where all three constraints are simultaneously satisfied.
Axis 1 — Capability requirement
- High-stakes reasoning, multi-step plans, hard-to-eval correctness: Opus.
- Mid-complexity reasoning, RAG over structured docs, agentic tool loops with bounded depth: Sonnet.
- Bounded classification, extraction with clear schema, fast triage: Haiku.
Axis 2 — Volume
- <1K calls/day: model choice is dominated by capability; cost is rounding error.
- 10K–100K/day: model choice is dominated by per-call cost. Sonnet or Haiku unless capability requires Opus.
- >1M/day: model choice is constrained by both per-call cost and rate-limit headroom. Haiku for the bulk; Opus only on selected paths.
Axis 3 — Latency budget
- >30s tolerable: any model.
- 5–30s: any model with streaming.
- <5s P95: Sonnet or Haiku, with caching, with parallel tool calls.
- <2s P95: Haiku, with prompt caching, ideally with prefetch.
The "tiered architecture" pattern
Most BFSI use cases benefit from a tiered model architecture — different requests routed to different models based on complexity. The pattern:
User request
│
▼
Haiku — triage classifier
│
├── Simple request → Haiku response (fast, cheap)
├── Standard request → Sonnet (RAG, drafting, agent loop)
└── Hard request → Opus (multi-step reasoning, supervision)
The triage classifier is itself an eval-gated component — too aggressive on routing-up costs the customer money; too aggressive on routing-down costs accuracy.
Six BFSI use cases × model selection (preview)
| Use case | Hot path | Triage / supervision |
|---|---|---|
| Support agent assist | Sonnet 4.6 | Haiku for triage classifier |
| Internal SOP search | Sonnet 4.6 | Haiku for query rewriter |
| RM copilot | Sonnet 4.6 | Opus for plan-and-execute occasionally |
| Compliance doc review | Opus 4.7 (hot path) | Sonnet for first-pass draft |
| Developer productivity (Claude Code) | Sonnet 4.6 | Opus for hardest tasks |
| Exec analytics | Sonnet 4.6 | Haiku for query routing |
→ Full SystemDesigns: Module 05 SystemDesigns.
The 30-second answer (interview-room version)
"Three models, three slots. Opus for the hardest reasoning — orchestration, multi-step plans, hard-to-eval correctness. Sonnet is the workhorse — agent loops, RAG, drafting, the bulk of production traffic. Haiku is high-volume — classify, route, extract, and as the LLM-as-judge in eval frameworks. The pattern that compounds is tiered architecture: Haiku triages, Sonnet handles standard, Opus handles hard. For the BFSI support-agent-assist case I worked on, that pattern dropped per-decision cost ~70% vs naive-Opus routing."
Anti-patterns
Anti-pattern 1 — "Always use Opus to be safe"
Reads as not having done the math. Lean No. Opus on bulk traffic is a budget hole, and capability-wise often unnecessary.
Anti-pattern 2 — "Always use Haiku for cost"
Strong No on use cases where reasoning depth matters. Compliance review on Haiku is a deployment failure waiting to happen.
Anti-pattern 3 — One-model architectures
Every request routed to the same model. Wastes capability on simple requests and starves on hard ones. Lean No.
Anti-pattern 4 — Selection without evals
"We picked Sonnet." Why? "It seemed about right." Lean No. Selection without an eval scorecard is opinion, not architecture.
Anti-pattern 5 — Locking selection in production code
Hard-coded model IDs across the codebase. Strong No. Selection is a config decision; the eval framework should make substitution a change-config-and-rerun-evals operation.
Cross-references
- Sibling: Note 03 — Prompt Caching — caching changes the cost shape and shifts the matrix.
- Sibling: Note 05 — Deployment Surfaces — model availability per surface differs.
- Module 06: Eval Taxonomy — selection without evals is opinion.
- Drill: 01 — 30-Second Model Selection.
Strong-Hire bar for this file
- Three-model matrix internalized; can be spoken in 30 sec.
- Tiered-architecture pattern is reflex.
- Selection is always paired with eval framework — never opinion.
- The six BFSI use case selections defended cold.