04 · Drill 01 — 30-Second Model Selection
Status: Outline. Body fills in Week 2. Voice: principal-level, BFSI-threaded, Apic-calibrated.
What this drill is. Cold delivery of the Claude lineup decision matrix in 30 seconds — pick the right model for a given use case description, defended on capability + cost + latency. Ten use-case cards, randomized.
What this drill is NOT. A capability comparison drill. Not benchmark recital.
Prompt
Partner picks one of the 10 use-case cards (below) and reads it aloud. You have 30 seconds to: (a) name the model, (b) name the tier (Haiku / Sonnet / Opus), (c) name the one trade-off you accepted.
Time box
- Per card: 30 sec.
- Drill: all 10 cards = ~6 min.
- Hard cap: 35 sec — beyond is Lean No on this drill (the matrix isn't reflex).
The 10 use-case cards
- "Triage classifier in front of a support agent assist — 200K calls/day, sub-200ms latency budget, 5-class output."
- "Compliance document review for a public-sector regulator submission. Failure = retract submission."
- "RM copilot at a wealth-management desk — multi-turn agentic, customer 360, occasional plan-and-execute."
- "Internal SOP search RAG over a 50K-document corpus. ~10K queries/day."
- "LLM-as-judge for an eval harness scoring faithfulness on 5K candidate responses per run."
- "Exec analytics natural-language interface over Snowflake — 100 queries/day from C-suite users."
- "Developer productivity copilot inside an internal IDE plugin — streaming, sub-3s latency budget."
- "Customer-facing chatbot on a public site for product Q&A — 50K calls/day."
- "Multi-document compliance gap analysis across 200-page regulatory bundles. Run weekly, ~100 runs."
- "Tool-use planner that decides which of 12 tools to call on a Salesforce-integrated workflow."
Strong Hire answers (placeholder — fill in Week 2)
| Card | Answer | Trade-off accepted |
|---|---|---|
| 1 | Haiku 4.5, hot path | Capability ceiling — escalate complex cases to Sonnet |
| 2 | Opus 4.7 with eval gates | Cost — but the failure mode justifies it |
| 3 | Sonnet 4.6 default, Opus on hard | Two-model stack adds eval complexity |
| 4 | Sonnet 4.6 with caching | Cache pollution risk on doc updates — design for it |
| 5 | Haiku 4.5 as judge | Judge calibration must be tighter on a smaller model |
| 6 | Sonnet 4.6 + Haiku for query routing | Latency on the routing hop |
| 7 | Sonnet 4.6 streaming | Cost vs Haiku — but capability wins for code |
| 8 | Sonnet 4.6 with caching | Per-call cost; volume justifies caching investment |
| 9 | Opus 4.7 batch | Latency — but it's a batch run |
| 10 | Sonnet 4.6 for the planner, called tools per their needs | Planner-vs-executor split adds an eval layer |
Rubric
Strong Hire
- Model named in <10 sec.
- One specific trade-off named (not "it's a balance").
- Tier-architecture pattern surfaces when the use case justifies it.
- No "always Opus" or "always Haiku" defaults.
Hire
- Model named, but trade-off is generic.
- Two-model stack mentioned only when prompted.
Lean No
- "It depends" without the deciding axis named.
- Model picked without naming a trade-off.
Strong No
- Wrong model ID (e.g. "claude-3-opus" — old).
- Single-model hammer for every card.
Common Lean No traps
Trap 1 — Always picking Opus
Conservative, but reads as "I haven't done the math."
Trap 2 — Marketing-speak
"Sonnet is a great workhorse" — without the why (capability fit, cost shape, latency).
Trap 3 — Skipping the trade-off
The rubric requires a trade-off named. Skipping it = Lean No.
Trap 4 — Wrong/stale model IDs
Using claude-3-opus, claude-2, etc. Strong No.
Trap 5 — Ignoring the latency or volume signal
Card 1 says "sub-200ms latency, 200K calls/day." Picking Sonnet ignores both signals.
How to run this drill
- Partner shuffles cards. You get them cold.
- Timer starts when card is read.
- 30 sec to answer. You get scored on model + trade-off.
- Log to Drill Tracker.
- Application trigger: 9 of 10 cards at Strong Hire across 2 sessions.
Cross-references
- Note: 02 — Model Selection Matrix.
- Note: 03 — Prompt Caching.
- Module 06: Eval Taxonomy.
- Drill Tracker.
Strong-Hire bar for this drill
- 9/10 cards delivered in <30 sec.
- Trade-off named on every card.
- No stale model IDs.
- Tier-architecture pattern reflex when justified.