How D1 and D2 Failures Interact
# D2 problem: poor tool descriptions
tools = [
{
"name": "get_customer",
"description": "Gets customer data", # ← Too vague
},
{
"name": "get_order",
"description": "Gets order data", # ← Too vague — same pattern
}
]
# D1 consequence: loop calls wrong tool
# Claude sees "get_customer" and "get_order" — both described as "gets data"
# For "look up order #12345", Claude may call get_customer with order ID
# Result: 400 error, loop retries, fails again — 3 retries then gives up
# The fix is D2 (better descriptions), not D1 (loop control):
tools_fixed = [
{
"name": "get_customer",
"description": """Retrieves customer profile by customer ID (format C-XXXXXX).
Use when you need: name, email, account tier, contact history.
Do NOT use for order information → use get_order instead."""
},
{
"name": "get_order",
"description": """Retrieves order details by order ID (format ORD-XXXXXXXX).
Use when you need: order status, items, amounts, shipping info.
Do NOT use for customer profile → use get_customer instead."""
}
]
Scenario 2: Missing Error Category → Loop Can’t Recover
# D2 problem: tool error doesn't include error category
def get_customer_bad(customer_id: str) -> dict:
try:
return db.query(customer_id)
except Exception as e:
return {"error": str(e)} # ← No category, no isRetryable
# D1 consequence: loop can't decide whether to retry
async def handle_tool_result(tool_result: dict) -> str:
if "error" in tool_result:
# D1 loop doesn't know if this is retryable or not
# So it either always retries (wastes retries on permanent failures)
# or never retries (gives up on transient failures)
return "error" # No intelligent decision possible
# The fix is D2 (structured error) that enables D1 (intelligent retry):
def get_customer_good(customer_id: str) -> dict:
try:
return db.query(customer_id)
except TimeoutError:
return {
"error": "Database timeout",
"error_category": "transient",
"isRetryable": True,
"retry_after_seconds": 2
}
except PermissionError:
return {
"error": "Insufficient permissions",
"error_category": "permission",
"isRetryable": False
}
# D2 problem: 20 tools given to one agent
COORDINATOR_TOOLS = all_20_tools # D2 violation: too many tools
# D1 consequence: loop selection degrades
# At 20 tools, Claude picks the wrong tool ~25% of the time
# Loop has to retry and escalate more often
# Overall reliability drops significantly
# The fix combines D2 (scoped tools) with D1 (specialized subagents):
# D2: each agent type gets 3-5 tools for its role
INTAKE_TOOLS = [get_customer, get_order, create_ticket] # 3 tools
BILLING_TOOLS = [get_customer, get_payment_history, process_refund] # 3 tools
TECHNICAL_TOOLS = [get_customer, check_service_status, reset_auth] # 3 tools
# D1: coordinator routes to appropriate specialized subagent
# Each subagent loop is highly reliable because it has few, well-chosen tools
The Combined Reliability Equation
System reliability =
Loop control correctness (D1)
× Tool description quality (D2)
× Tool error structure quality (D2)
× Tool count per agent (D2)
× Tool result normalization (D2)
If any factor is near zero, system reliability is near zero.
D1 and D2 must BOTH be done well.
Key Takeaways
- Loop + tool failures compound — neither domain is sufficient alone
- Wrong tool selection is a D2 description problem, not a D1 loop problem
- Unintelligent retry is a D2 error structure problem
- Reliability degrades with tool count — D2 scoping enables D1 reliability
- Exam scenarios showing loop failures often trace to D2 tool design issues