04 · CodeLab 02 — Tool Use
Status: Outline. Body fills in Week 2. Voice: principal-level, hands-on, BFSI-themed.
What this lab is. A working tool-use loop: define a tool schema, handle the tool_use → tool_result loop, run multiple tools in parallel, surface the eval-relevant fields. The second commit of the GitHub artifact.
What this lab is NOT. A LangChain/LangGraph wrapper. Not a multi-agent crew.
Goal
By end of lab:
- A Python module that:
- Defines 2–3 tools with proper JSON Schema.
- Implements the tool_use loop end-to-end.
- Handles parallel tool calls correctly.
- Records tool_use blocks + tool_result blocks for eval consumption.
- Demonstrates an idempotent vs non-idempotent tool side by side.
Prerequisites
- CodeLab 01 complete.
- Familiarity with the response shape (content blocks, stop_reason).
Step-by-step outline
Step 1 — Define the tool schema for a BFSI customer-balance lookup
- Tool name:
get_customer_balance. - Input schema:
{account_id: str (regex: account-format), as_of_date: str (ISO date)}. - Description: precise, behavioral. "Returns the available account balance as of the given date. Read-only. Does not initiate any transactions."
Step 2 — Define a second tool: a deterministic FX rate lookup
- Tool name:
get_fx_rate. - Schema:
{base: str (3-letter code), quote: str (3-letter code), as_of_date: str}. - Description: read-only, deterministic.
Step 3 — Define a third tool: a non-idempotent operation (for the contrast)
- Tool name:
submit_alert_to_compliance(mutation). - Schema includes a
client_request_idfield for idempotency-key passing. - Description explicitly notes it's mutation and idempotent-by-key.
Step 4 — Implement the tool_use loop
- Start a conversation with
tools=[...]. - On
stop_reason == "tool_use", parsetool_useblocks. - Run the tools (mock implementations are fine for the lab).
- Construct
tool_resultblocks with the outputs. - Continue the conversation; loop until
stop_reason == "end_turn".
Step 5 — Parallel tool calls
- Test a prompt that invites two independent tools (e.g. "what's customer X's USD balance in INR as of today?").
- Verify Claude returns multiple
tool_useblocks in one response. - Run them in parallel (asyncio.gather or threadpool).
- Return both
tool_resultblocks in one user-message turn.
Step 6 — Error handling
- One tool deliberately raises (invalid
account_id). - Return the error as a
tool_resultwithis_error: trueand a clear message. - Verify Claude reasons over the error rather than papering over it.
Step 7 — Logging hooks
- Each tool call (request + response) goes to
logs/tool_calls.jsonl. - Used in CodeLab 05 to score tool-call correctness.
Code skeleton (placeholder — fill in Week 2)
TOOLS = [
{
"name": "get_customer_balance",
"description": "Returns the available account balance as of the given date. Read-only.",
"input_schema": {
"type": "object",
"properties": {
"account_id": {"type": "string", "pattern": "^acct-[0-9]{8}$"},
"as_of_date": {"type": "string", "format": "date"},
},
"required": ["account_id", "as_of_date"],
},
},
# ... fx_rate, submit_alert_to_compliance
]
def run_tool_use_loop(initial_prompt: str):
messages = [{"role": "user", "content": initial_prompt}]
while True:
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
tools=TOOLS,
messages=messages,
)
messages.append({"role": "assistant", "content": resp.content})
if resp.stop_reason == "end_turn":
return resp
tool_results = handle_tool_uses(resp.content)
messages.append({"role": "user", "content": tool_results})
Acceptance criteria
- [ ] Three tools defined with clean schemas.
- [ ] Tool-use loop runs end-to-end.
- [ ] Parallel tool calls demonstrated and verified.
- [ ] Error case demonstrated; Claude reasons over the error.
- [ ] Idempotency-key pattern visible on the mutation tool.
- [ ] Structured logging captures tool_use + tool_result pairs.
Stretch goals
- Wrap the tool loop as an
mcpserver (preview of CodeLab integration with MCP). - Add a "tool budget" — fail the conversation if Claude calls >N tools.
Cross-references
- Note: 04 — Tool Use and Structured Outputs.
- Module 08: Tool Schema Design Doctrine.
- Sibling: 03 — Structured Output.
- Sibling: 05 — Custom Eval Harness.
Strong-Hire bar for this lab
- Tool schemas are behavioral contracts, not field lists.
- Parallel tool calls work first try; not added as an afterthought.
- Error path doesn't paper over failure.
- Mutation tool wears its idempotency key on its sleeve.