Drill 03 — Customer Architecture Story
The drill that the project deep-dive round is built on. 5 minutes of you presenting an architecture; in the real loop, 40 more minutes of skeptical Q&A. Pick the project you can defend for 40 minutes — not the one with the prettiest outcome metric.
The prompt
"Walk me through a complex AI architecture you designed."
Variants: "Tell me about your most challenging AI engagement." / "Pick a recent project — walk me through the architecture." / "Describe a system you designed end-to-end."
Time budget: 5 minutes for the standalone walkthrough. In the project deep-dive round, this becomes a 20-minute presentation followed by 40 minutes of pointed Q&A.
Why this drill matters
This is the round of the loop, per public Apic interview intel. The project deep-dive in onsite Loop 2 is "the most-feared and most-decisive part of the loop." The interviewer becomes a skeptical senior engineer in a design review. They probe every choice. They are not adversarial — they are testing whether you can defend choices honestly under sustained scrutiny.
The 5-minute version (this drill) is the foundation. If you can't tell the story crisply in 5 minutes, you cannot defend it for 40 minutes. The structure has to be load-bearing.
Project selection — the most important step
Don't pick the project with the best outcome metric. Pick the project where you can:
- State the customer + business problem precisely (not "we improved CSAT")
- Name the constraints that shaped the architecture
- Defend at least three architectural choices, naming the alternative you rejected
- Admit one failure mode honestly — what almost broke, what you caught in eval, what slipped to production
- State what you'd change today, with reasoning
- Surface the safety / governance lens unprompted
- Translate the value to executive language
If you can do these seven things for a project, it's defensible for 40 minutes. If you can only do four of them, the round will go badly. Pick a project where you can do all seven.
Personal anchor
From your résumé, the candidate projects ranked by defensibility for the Apic loop:
| Project | Defensibility | Why |
|---|---|---|
| Agentic Hybrid GraphRAG (GenAI services firm) | Highest | Real architecture choices (graph + vector + agentic), real trade-offs (Neo4j vs alternatives, sync vs async, RAGAs gating), production deployment, multi-million-dollar engagement. The interviewer will love it. |
| Healthcare Multi-Agent Platform (healthcare AI consultancy) | High | HIPAA-aligned, recent, Claude-aware, governance-first by design. The strongest Apic-narrative fit — but slightly less mature in your delivery story than the GraphRAG work. |
| Voice AI on Vertex (GenAI services firm, 40% CX lift) | Medium-high | Production-scale, GCP-native, agent-state across sessions. Outcome metric is sharp. The risk: Voice AI architecture is less Apic-fit (latency vs reasoning trade-offs read differently). |
| Equity-trading surveillance (global investment bank) | Medium | Production AI in regulated FSI; great safety-as-throughput angle; older (2021–22) so the interviewer may probe how you'd architect it differently today. |
Recommended primary: Agentic Hybrid GraphRAG — strongest architecture story. Recommended backup if asked for a second project: Healthcare Multi-Agent Platform — strongest Apic-narrative fit. Save for the values round, not the architecture round: equity-trading surveillance.
Run the GraphRAG story end-to-end with this drill first. The Healthcare Multi-Agent one second.
The Strong Hire structure (5-minute version)
Six beats, ~50 seconds each:
Beat 1 — Customer + business problem (~50 sec)
Specifically: who the customer was (industry, scale, regulatory posture), what the actual business problem was (not the requested AI solution — the underlying problem), and why a previous approach hadn't worked.
Pattern:
"The customer was [industry, scale, regulatory posture]. Their underlying business problem was [specific], which they were currently solving by [current approach] — that approach had [specific limitation]. The ask we got was [requested AI solution] — but the discovery showed the real problem was [refined understanding]."
Beat 2 — Constraints that shaped the architecture (~50 sec)
Three to five hard constraints. Not "scalable" or "performant" — concrete: data residency, latency p95, RBAC at retrieval layer, hallucination tolerance, integration with named existing systems, compliance regime.
Pattern:
"The hard constraints were: [constraint 1, with the specific number or rule], [constraint 2], [constraint 3]. The constraints we negotiated were [softer ones]. The constraints we explicitly de-scoped were [things we said no to]."
Beat 3 — Architecture choices + trade-offs (~70 sec)
The longest beat. Walk through 3 major choices, each with the alternative you rejected and why.
Pattern:
"At the retrieval layer, we chose [X] over [alternative Y] because [reason — usually tied to a constraint from Beat 2]. At the orchestration layer, we chose [X] over [Y] because [reason]. At the eval layer, we chose [X] over [Y] because [reason]. The trade-off we accepted was [specific trade-off — e.g., 'higher latency in exchange for synchronous accuracy because the downstream workflow couldn't tolerate eventual consistency']."
Beat 4 — Failure mode + what you caught (~50 sec)
The honesty beat. One specific failure mode you encountered — caught in eval ideally, or in early production — and how the architecture responded.
Pattern:
"The thing that almost broke us was [specific failure mode]. We caught it because [specific eval / observability mechanism]. We responded by [specific architectural change]. The lesson was [what you'd take to the next project]."
Beat 5 — Safety / governance lens (~40 sec)
Surface this unprompted. The architectural choice that wasn't the most-discussed but was the most-important from a safety perspective.
Pattern:
"From a safety lens, the under-discussed but most-important architectural choice was [specific — e.g., RBAC at the retrieval layer, or the human-in-the-loop approval gate, or the refusal-pattern tuning]. The model can't [leak / act on / hallucinate about] what the architecture doesn't let it see / do."
Beat 6 — Executive value + what you'd change today (~40 sec)
Close with the business framing and the honest "I'd do differently now" beat.
Pattern:
"For the executive: the value was [measurable outcome] tied to [downstream business KPI]. What I'd change today: [specific architectural change], because [reason — usually 'the platform / model has matured since' or 'I underestimated [X]']."
What gets each grade
Strong No
- Walks through a project chronologically ("first we built X, then we built Y")
- No customer detail, no constraints, no trade-offs
- Outcome-only ("we improved CSAT 40%")
- Cannot defend a choice when probed
- Pretends nothing went wrong
Lean No
- Has a clear architecture but is structured as a feature list
- Names some technologies but can't defend the choices
- No failure mode admitted
- No safety lens surfaced
- No "I'd change this today" beat
Hire
- Six-beat structure mostly intact
- Trade-offs named, choices defensible
- Failure mode admitted
- Missing one of: safety lens, executive translation, "I'd change" beat
Strong Hire
- Six beats land in 5 minutes with steady cadence
- Each architectural choice is named with the alternative rejected
- Failure mode is honest and specific
- Safety lens is surfaced unprompted
- Executive value is translated cleanly
- "I'd change this today" is grounded in specific reasoning, not vague humility
- Can transition into the 40-minute Q&A by saying "happy to go deeper on any of those" without panic
What the 40-min Q&A will probe
The probes are predictable. Have prepared positions:
- "Why X over Y at the [retrieval / orchestration / eval] layer?" — name your reason; if pushed, name the trade-off; if pushed further, admit "the alternative would have worked too — here's why I chose what I did."
- "What if traffic 10x'd?" — talk about which layer breaks first, what you'd horizontally scale, what you'd cache, where the cost ceiling lives.
- "What if the customer asked for on-prem?" — talk about what changes architecturally (model deployment surface, data flow, audit logging, eval harness portability).
- "What's the worst-case hallucination scenario?" — name the worst case honestly, then describe the mitigations you architected (refusal patterns, citation requirements, human-in-the-loop, eval gates).
- "How would you have evaluated this differently?" — your eval framework is a first-class architecture artifact; defend it like one.
- "What was the commercial outcome?" — translate to business KPI, contract size, expansion potential.
- "What surprised you?" — have a real surprise ready. Saying "nothing surprised me" is a Lean No.
Common Lean No traps
- Outcome-first. Leading with "we improved CSAT 40%" instead of the business problem signals you're optimizing for impressing, not for clarity.
- Technology name-dropping. "We used Neo4j, FAISS, LangChain, LangSmith, RAGAs, Vertex AI..." without choices defended is just a tool list.
- Pretending nothing went wrong. A project that ran perfectly is either fictional or shallow. Admit a real failure mode.
- Skipping the safety lens. The Apic interview will eventually ask. Surfacing it unprompted is the move.
- Missing the executive translation. Without the business value beat, the story reads as engineering-internal.
- No "I'd change." Saying "I wouldn't change anything" reads as low-self-awareness or rehearsed.
Practice protocol
- Pick the project. Per the personal note above, default to the GenAI services firm GraphRAG.
- Write the 5-minute version using the six beats. Do not bullet-point — write it as you'd say it.
- Time it aloud. Cut to ≤300 sec.
- Bullet-point the structure. Six beats → talking points only.
- Speak from bullets. Record.
- Have a colleague (or yourself in 24 hours) ask 3 of the 7 probable Q&A probes above. Practice the responses.
- Iterate until the 5-minute version is steady AND you have positions on at least 5 of the 7 probable probes.
- Compress to 90 seconds for use in the recruiter or HM screen — the structure stays the same; the depth shrinks.
Personal anchor
Recommended choice: the GenAI services firm Agentic Hybrid GraphRAG
Why this one: The strongest defensibility profile across all 7 of the criteria above. You have:
- Real customer + business problem
- Hard constraints (likely residency, RBAC, latency)
- At least 3 defensible architectural choices (graph + vector hybrid; ReAct + self-reflection orchestration; RAGAs as gating)
- At least one real failure mode (you mentioned RAGAs catching things in eval)
- Production deployment with measurable outcome
- Safety lens (RBAC at retrieval; eval as gate not metric)
- Multi-million-dollar contract value as executive frame
Six-beat skeleton for the GenAI services firm GraphRAG (fill in the customer-specific details)
- Customer + business problem — "The customer was [industry, regulatory posture]. Their underlying problem was [specific operational problem]. The current solution was [previous approach]; its limitation was [specific]."
- Constraints — "Hard constraints: data residency [specific region requirement], RBAC at retrieval layer (no bypass through query rewriting), p95 retrieval latency under [number] ms, audit trail per query."
- Architecture choices — "Hybrid retrieval: graph for [reason], vector for [reason], chosen because pure-vector missed [X] and pure-graph missed [Y]. Orchestration: ReAct + self-reflection over single-pass because [reason]. Eval: RAGAs Faithfulness as a deployment gate, not just a monitoring metric, because [reason]."
- Failure mode — "The thing that almost broke us was [specific — recommend recalling the actual one from the GenAI services firm]. Caught in eval at the Faithfulness threshold because we'd built it as a gate. Architectural fix: [specific change]."
- Safety lens — "From a safety lens, the under-discussed but most-important choice was RBAC at the retrieval layer — the LLM can't leak what the architecture doesn't let it see. The other was the eval-as-gate posture; production was contingent on a Faithfulness threshold being met."
- Executive value + what you'd change — "For the executive: [specific operational KPI improvement] tied to [downstream business outcome]. What I'd change today: I'd push harder for the customer to invest in the eval set up front; we underbuilt it because of timeline pressure. Today I'd treat the eval set as a contractual deliverable, not a delivery internal artifact."
Replace the bracketed sections with the actual specifics from your engagement (you have the real numbers; this skeleton is the structure).
What to watch for in the 40-min Q&A
Three probes you should expect given this story:
- "Why hybrid? Couldn't pure-vector with better embeddings have worked?" — Have a precise answer. If your answer is "graph captured relationships vector embeddings flatten" — ground it in a specific query type the customer ran.
- "How did you size the eval set?" — Have a number and methodology. "We started with [N] examples curated by [SME role], grew to [M] over the engagement, balanced across [categories]."
- "What model did you use? Why not Claude?" — Be honest. Whatever model you used, you can frame it as "the model selection was constrained by [customer choice / available platform], and the architecture was model-agnostic at the inference layer — moving to Claude today would be straightforward and would benefit from [specific Claude advantage]."
Your turn
When you're ready, write your raw 5-minute architecture story below — pick a project, work through the six beats, write it as you'd speak it. Aim for ≤500 seconds spoken (about 1000–1100 words written).
I'll grade on the rubric above, write the upgraded version, anticipate the most-likely Q&A probes, and log the result in Drill Tracker.