04 · CodeLab 03 — Structured Output
Status: Outline. Body fills in Week 2. Voice: principal-level, hands-on, BFSI-themed.
What this lab is. Extend the artifact from CodeLab 02 to produce typed, validated structured responses — using Pydantic models and the tool-schema trick — building a compliance gap analysis schema for the BFSI compliance document review use case.
What this lab is NOT. A JSON prettifier. Not a schema-validation library wrapper. The goal is a repeatable pattern for converting unstructured Claude responses into strongly-typed Python objects that downstream systems can consume without defensive parsing.
Goal
By end of lab:
- A Python module that:
- Defines a Pydantic schema for
ComplianceGapAnalysis. - Uses the tool-schema trick to coerce Claude into returning structured JSON.
- Deserializes the response into the Pydantic model with validation.
- Handles partial / malformed outputs gracefully (retry with repair prompt).
- Logs the typed result alongside the raw response for eval consumption.
- Defines a Pydantic schema for
The BFSI framing: you are reviewing a draft circular from the compliance team against RBI Master Direction on KYC. Claude must identify each gap, its severity, and a remediation action — all in a typed schema.
Prerequisites
- CodeLab 01 complete (SDK initialized, JSONL logging in place).
- CodeLab 02 complete (tool_use loop understood).
pydantic>=2.0added topyproject.toml.
Background: the two structured-output paths
Path A — response_format (not yet on Claude; OpenAI-only as of April 2026)
The response_format: {type: "json_schema", ...} parameter is an OpenAI-ism. Apic's SDK does not expose this on messages.create. Do not use it. This is a common candidate trap in pre-sales demos.
Path B — Tool-schema trick (the correct path for Claude)
Define a single tool whose input_schema is the exact Pydantic model structure you want. Set tool_choice={"type": "tool", "name": "submit_gap_analysis"}. Claude is forced to produce a tool call with your typed payload as its input. Deserialize with model.model_validate(tool_input).
This is not a hack — it is the documented approach and the one Apic recommends for structured extraction with validation guarantees.
Step-by-step outline
Step 1 — Define the Pydantic schema
# src/sun_claude_artifact/schemas/compliance_gap.py
from enum import Enum
from typing import Optional
from pydantic import BaseModel, Field
class GapSeverity(str, Enum):
CRITICAL = "critical" # regulatory penalty likely if not remediated
HIGH = "high" # material control gap
MEDIUM = "medium" # process deviation, reportable internally
LOW = "low" # cosmetic / documentation gap
class ComplianceGap(BaseModel):
clause_reference: str = Field(
description="The specific clause or section of the governing regulation "
"(e.g. 'RBI Master Direction KYC 2016, Section 18.3')."
)
gap_description: str = Field(
description="Precise statement of the gap: what the regulation requires "
"versus what the document states or omits."
)
severity: GapSeverity
remediation_action: str = Field(
description="Concrete action required to close the gap. "
"Start with an imperative verb (Add, Remove, Amend, Escalate)."
)
owner_function: Optional[str] = Field(
default=None,
description="The BFSI business function responsible for remediation "
"(e.g. Compliance, Legal, Operations, IT)."
)
timeline_days: Optional[int] = Field(
default=None,
description="Suggested remediation timeline in calendar days."
)
class ComplianceGapAnalysis(BaseModel):
document_title: str
governing_regulation: str = Field(
description="Primary regulation being assessed against."
)
review_summary: str = Field(
description="2–3 sentence executive summary of the overall compliance posture."
)
gaps: list[ComplianceGap]
overall_risk_rating: GapSeverity = Field(
description="Highest severity gap found, rolled up to document level."
)
reviewer_confidence: float = Field(
ge=0.0, le=1.0,
description="Model's self-reported confidence in the analysis (0–1). "
"Low values (<0.7) should trigger human review."
)
Why these fields matter in a BFSI context:
- clause_reference: makes the output auditable — auditors need the precise reg cite, not paraphrases.
- reviewer_confidence: explicitly flags uncertainty for HITL routing. A score < 0.7 must go to a human compliance officer before acting.
- timeline_days: forces the model to produce actionable output, not just diagnosis.
Step 2 — Generate the tool definition from the schema
# src/sun_claude_artifact/structured_call.py
import json
from apic import Apic
from pydantic import ValidationError
from .schemas.compliance_gap import ComplianceGapAnalysis
client = Apic()
# Derive the tool JSON Schema from the Pydantic model.
# Pydantic v2: model_json_schema() returns a JSON Schema dict.
GAP_ANALYSIS_TOOL = {
"name": "submit_gap_analysis",
"description": (
"Submit the completed compliance gap analysis. "
"Call this exactly once with the full analysis result. "
"Do not call any other tools."
),
"input_schema": ComplianceGapAnalysis.model_json_schema(),
}
Important: model_json_schema() produces valid JSON Schema. Verify it contains no $defs references that the tool input schema doesn't resolve — Pydantic v2 inlines nested models correctly, but check before shipping.
Step 3 — The parse helper
def analyze_compliance_document(
document_text: str,
regulation_name: str,
regulation_text: str,
model: str = "claude-opus-4-7", # Opus on hot path — near-zero hallucination tolerance
max_retries: int = 2,
) -> ComplianceGapAnalysis:
"""
Run a compliance gap analysis against a regulation.
Returns a validated ComplianceGapAnalysis or raises after max_retries.
"""
system = f"""You are a senior compliance analyst reviewing BFSI documents
against regulatory requirements. You must identify every material gap with
precision. You are NOT generating a summary — you are producing a structured
audit artifact that will be reviewed by a licensed compliance officer before
any action is taken.
Governing regulation:
<regulation>
{regulation_text}
</regulation>
Be conservative: if uncertain whether something is a gap, flag it at LOW severity
rather than omitting it. Set reviewer_confidence below 0.7 if the document is
ambiguous or the regulation is unclear."""
messages = [
{
"role": "user",
"content": (
f"Analyze the following document against {regulation_name}.\n\n"
f"<document>\n{document_text}\n</document>\n\n"
"Submit your findings using the submit_gap_analysis tool."
),
}
]
for attempt in range(max_retries + 1):
resp = client.messages.create(
model=model,
max_tokens=4096,
system=system,
tools=[GAP_ANALYSIS_TOOL],
tool_choice={"type": "tool", "name": "submit_gap_analysis"},
messages=messages,
)
# Extract the tool_use block
tool_use_block = next(
(b for b in resp.content if b.type == "tool_use"), None
)
if tool_use_block is None:
raise RuntimeError(f"No tool_use block in response (attempt {attempt})")
try:
result = ComplianceGapAnalysis.model_validate(tool_use_block.input)
return result
except ValidationError as e:
if attempt == max_retries:
raise
# Repair: feed the validation error back as a user turn
messages.append({"role": "assistant", "content": resp.content})
messages.append({
"role": "user",
"content": (
f"Your previous response failed schema validation:\n{e}\n\n"
"Fix the errors and call submit_gap_analysis again with a valid payload."
),
})
Why claude-opus-4-7 here? Compliance document review is the one use case in the BFSI stack with near-zero hallucination tolerance. The cost premium is justified. Sonnet 4.6 is appropriate for the RM copilot hot path; Opus is appropriate here.
Step 4 — Logging the typed result
# Extend the existing JSONL logging from CodeLab 01
import json, hashlib
from datetime import datetime, timezone
from pathlib import Path
LOG_DIR = Path("logs")
LOG_DIR.mkdir(exist_ok=True)
STRUCTURED_LOG = LOG_DIR / "structured_outputs.jsonl"
def log_gap_analysis(
document_title: str,
result: ComplianceGapAnalysis,
usage,
model: str,
):
entry = {
"ts": datetime.now(timezone.utc).isoformat(),
"model": model,
"document_title": document_title,
"input_tokens": usage.input_tokens,
"output_tokens": usage.output_tokens,
"gap_count": len(result.gaps),
"overall_risk": result.overall_risk_rating,
"reviewer_confidence": result.reviewer_confidence,
"result": result.model_dump(),
}
with STRUCTURED_LOG.open("a") as f:
f.write(json.dumps(entry) + "\n")
Step 5 — End-to-end smoke test
# scripts/run_compliance_review.py
from src.sun_claude_artifact.structured_call import analyze_compliance_document
SAMPLE_DOC = """
Customer Identification Policy v2.3 (Internal Draft)
...
Section 4: Account Opening
New accounts require submission of PAN card and Aadhaar.
Video KYC may be used for remote onboarding.
Re-KYC is required every 10 years for low-risk customers.
...
"""
SAMPLE_REGULATION = """
RBI Master Direction — Know Your Customer (KYC) Direction, 2016
(Updated Oct 2023)
...
Section 18: Periodic Updation of KYC
18.1 Low-risk customers: re-KYC every 10 years.
18.2 Medium-risk customers: re-KYC every 8 years.
18.3 High-risk customers: re-KYC every 2 years.
18.4 Re-KYC must capture updated Aadhaar + OTP confirmation.
...
"""
result = analyze_compliance_document(
document_text=SAMPLE_DOC,
regulation_name="RBI Master Direction KYC 2016",
regulation_text=SAMPLE_REGULATION,
)
print(f"Risk: {result.overall_risk_rating} | Gaps: {len(result.gaps)} | Confidence: {result.reviewer_confidence:.2f}")
for gap in result.gaps:
print(f" [{gap.severity}] {gap.clause_reference}: {gap.gap_description[:80]}...")
Common traps (interview-room traps too)
| Trap | Why it fails | Fix |
|---|---|---|
Using response_format: json_schema |
Not supported on Claude as of April 2026 | Use tool-schema trick |
| Asking Claude to "return JSON in your response" | Output is a string; parse errors are silent | Force via tool_choice |
Using tool_choice: "auto" |
Claude may choose not to call the tool | Use tool_choice: {type: tool, name: ...} |
Deriving schema with Pydantic v1 .schema() |
Returns different shape than v2 | Use model_json_schema() (Pydantic v2) |
| Putting 30K-token policy bundle in the user turn | Token cost balloons every call | Cache the regulation text (→ CodeLab 04) |
Acceptance criteria
- [ ]
ComplianceGapAnalysisPydantic schema defined with field descriptions. - [ ] Tool definition derived programmatically from the schema (no hand-written JSON Schema duplication).
- [ ]
analyze_compliance_document()function returns a validated Pydantic object or raises. - [ ] Retry-with-repair loop handles one ValidationError.
- [ ] Structured log captures result, usage, and confidence.
- [ ] Smoke test runs end-to-end on the sample document.
- [ ]
reviewer_confidence < 0.7path is observable in the logs (even if not wired to HITL yet).
Stretch goals
- Add a second schema:
PolicyChangeImpactAssessmentfor comparing two versions of a circular. - Wire
reviewer_confidence < 0.7to a stub HITL notification (print + log entry withhitl_required: true). - Benchmark Opus vs. Sonnet on the same document; record gap count and confidence delta.
- Cache the regulation text with
cache_control(→ CodeLab 04, the next lab).
What this lab feeds
- CodeLab 04 (caching) — the regulation text in
systemis the prime caching candidate. - CodeLab 05 (eval harness) —
structured_outputs.jsonlis a second input stream for the judge. - Module 05 SystemDesign 04 (Compliance Document Review) — this schema is the typed payload in the production design.
Cross-references
- Note: 04 — Tool Use and Structured Outputs
- Note: 03 — Prompt Caching Architecture
- Sibling: 02 — Tool Use
- Sibling: 04 — Prompt Caching
- Sibling: 05 — Custom Eval Harness
- System Design: Compliance Document Review
Strong-Hire bar for this lab
- Schema is derived from Pydantic — not hand-written JSON Schema duplication.
tool_choiceforced — no ambiguity about whether Claude will call the tool.reviewer_confidencefield is present and wired to a HITL signal (even if stub).- Retry-with-repair works; the error message fed back is precise, not generic.
- The Opus choice is justified in a comment, not arbitrary.