10 · Drill 01 — Estimate Tokens from Business Volume
Status: Outline. Body fills in Week 4.
What this file is. Back-of-envelope token cost estimation from a business volume description — the skill of deriving a monthly cost figure with explicit assumptions in under 3 minutes.
What this file is NOT. A calculator replacement — the goal is judgment about order of magnitude and the ability to name the top cost lever without a spreadsheet.
Prompt
You are in a pre-sales meeting. A BFSI CDO says: "We're considering Claude for five use cases. Before we talk architecture, give me a rough monthly cost estimate for use case 1 — our contact centre, 200,000 agent assist calls per day. Average conversation is about 8 turns." Produce your estimate on a whiteboard in 3 minutes. State your assumptions explicitly. Name the single highest-impact cost lever.
Time box: 3 minutes. No calculator. Write the math.
Scenario cards
Scenario 1 (Worked — see Strong Hire below): Contact centre agent assist. 200K calls/day. Average 8-turn conversation. Customer queries are account/transaction related (short). Agent responses are 1–3 sentences (short).
Scenario 2: Internal SOP search. 15K employee queries/day. Employees search policy documents. Avg query: 50 words. System retrieves 3 policy chunks (avg 400 tokens each). Response: 2–4 paragraphs.
Scenario 3: Compliance document review. 5,000 documents/month. Avg document: 12 pages / 8,000 tokens. Review output: structured JSON, ~600 tokens. Nightly batch.
Scenario 4: RM copilot. 3,000 RM sessions/day. Each session: 5 interactions. Avg interaction: 150-token query + 400-token CRM context + 500-token response. Model is Sonnet (relationship complexity).
Scenario 5: Executive analytics assistant. 500 queries/day. Each query: 200-token question + 2,000-token data context. Response: 800 tokens. Low volume, high reasoning — Sonnet.
Strong Hire answer: Scenario 1
Step 1: Establish token counts per call (state assumptions):
Assumption A: System prompt (role, guardrails, tool defs) = 3,000 tokens.
This is cacheable and stable.
Assumption B: Conversation context per call = last 4 turns × avg 200 tokens/turn
= 800 tokens. (8-turn conversation; send half as context)
Assumption C: Current user query = 80 tokens (short, account-related)
Assumption D: Model output = 250 tokens (1–3 sentence agent suggestion)
Assumption E: Model = Haiku 4.5 (customer-facing, latency-sensitive,
short extractive generation)
Assumption F: Cache hit rate on system prompt = 85%
Step 2: Per-call cost:
Input (uncached portion): 800 + 80 = 880 tokens
→ 880 × $0.25/M = $0.00022
Input (cache read, system prompt): 3,000 × $0.03/M × 0.85 = $0.0000765
Input (cache write, misses): 3,000 × $0.30/M × 0.15 = $0.000135
Output: 250 × $1.25/M = $0.0003125
Total per call: ~$0.00083
Step 3: Monthly cost:
200,000 calls/day × 30 days = 6,000,000 calls/month
6,000,000 × $0.00083 = $4,980/month ≈ $5,000/month
Step 4: Top cost lever:
Model tier selection. If we used Sonnet instead of Haiku on this use case:
Sonnet input rate: $3/M vs Haiku $0.25/M (12× more expensive)
Sonnet output rate: $15/M vs Haiku $1.25/M (12× more expensive)
Monthly cost at Sonnet: ~$60,000/month vs $5,000/month
The top lever is model tier. Haiku is appropriate for this use case (short, extractive, latency-sensitive). Sonnet would be a $55K/month mistake.
The assumption log (speak this aloud):
"I'm assuming Haiku is the right tier — short, extractive, no complex reasoning. I'm assuming 85% cache hit rate; if the system prompt changes frequently, that drops and cost rises. I'm assuming we send the last 4 turns of context, not the full 8 — if the full conversation is always sent, input tokens double and cost goes up ~30%. I'll validate all three with the engineering team."
Rubric
Strong Hire
- Derives cost from first principles (formula), not rate card lookup
- States ≥5 explicit assumptions before calculating
- Gets the model tier decision right (Haiku, not Sonnet) and explains why
- Names the top cost lever correctly (model tier) and quantifies the delta
- Completes in 3 minutes with legible whiteboard math
- Proactively flags what could make the estimate wrong (cache hit rate, context window size)
Hire
- Correct method and correct model tier, but:
- Assumptions are vague (e.g., "typical input" without a number)
- Or: names caching as the top lever instead of model tier (partially right, wrong magnitude)
- Or: takes 5+ minutes
- Fixable with one pass of the formula and more explicit assumption discipline
Lean No
- Uses a round number without deriving it ("probably around $10K/month")
- Cannot name the model tier or justify why Haiku vs. Sonnet
- Does not state assumptions, or states them after the estimate rather than before
- Names "batching" as the top lever (only applicable to offline workloads — wrong use case)
- Cannot articulate what would change the estimate
Strong No
- Quotes rates incorrectly (e.g., confuses input and output rates by a factor of 5)
- Recommends Opus for a 200K/day customer-facing use case
- Does not know what TTFT or cache hit rate means
- Claims the estimate is "exact" without acknowledging the assumption dependency
Common traps
Trap 1: Forgetting output tokens cost 5× input. Most candidates correctly estimate input cost and underestimate output cost. For a generation-heavy use case (RM copilot, compliance review), output tokens can be 40–60% of total cost.
Trap 2: Assuming 100% cache hit rate. It's a common optimism. Real cache hit rates for well-designed prompts: 75–90%. For poorly designed (prompt changes frequently): 30–50%. The estimate should have cache hit rate as a sensitivity variable.
Trap 3: Using Sonnet for everything because "it's better." Better at what? For extractive, short-output, customer-facing tasks, Haiku is architecturally appropriate. Defaulting to Sonnet without justification is a cost architecture failure.
Trap 4: Not stating assumptions before calculating. In a pre-sales meeting, assumptions stated upfront = professional. Assumptions stated after a number = backpedaling. Train the habit: "Here are my assumptions, then the math."
Trap 5: Forgetting to mention the sensitivity. The estimate is only as good as the assumptions. A strong candidate closes with: "If volume is 2× higher, cost doubles. If cache hit rate drops to 60%, add 15%. If we switch to Sonnet for any use case, multiply by 12." This is what the CDO actually needs to budget.
Personal anchor
Practice this drill with Scenario 2 (SOP search) and Scenario 4 (RM copilot) — the harder cases because SOP search has retrieval context to account for and RM copilot is Sonnet-tier. Time yourself. The 3-minute constraint is real — Apic interviewers will stop you.