What Lives in the Context Window
Every token Claude processes in a single API call:
┌─────────────────────────────────────────────────────┐
│ CONTEXT WINDOW (200k tokens for claude-sonnet-4-6) │
│ │
│ System prompt ~2,000 tokens │
│ CLAUDE.md content ~1,000 tokens │
│ Conversation history ~50,000 tokens (long run) │
│ Tool results ~30,000 tokens (unmanaged) │
│ Current message ~500 tokens │
│ Claude's response ~4,000 tokens │
│ ────────────── │
│ Total used ~87,500 tokens (44%) │
└─────────────────────────────────────────────────────┘
At 44% this looks fine. But after 2 more hours of tool calls and conversation, you’re at 180k — and the critical system instructions from the beginning are now buried in the middle.
The Lost-in-the-Middle Effect
Research consistently shows Claude’s attention is non-uniform across long contexts:
Attention level across context position:
High ▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░▓▓▓▓▓▓▓
│ │
START MIDDLE END
(System (Tool results, (Recent
prompt, history) message)
key rules)
Low ░░░░░░░░░░░░░░░░░░▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░░
Important instructions buried in the middle of 100k tokens of tool results get deprioritized. This is why:
- Critical rules belong in the system prompt (start)
- Key context belongs near the current message (end)
- Verbose intermediate results belong summarized or external
What Fills Context Fastest
# Rank by context consumption per operation:
1. Database query results (full rows): 5,000–50,000 tokens
2. File contents (large files): 2,000–20,000 tokens
3. API responses (verbose JSON): 1,000–10,000 tokens
4. Web search results (full pages): 2,000–8,000 tokens
5. Prior conversation turns: 100–500 tokens each
6. Tool descriptions: 50–200 tokens each
Monitoring Context Usage
def estimate_context_usage(messages: list, tools: list) -> dict:
"""Rough token estimate before API call."""
import anthropic
# Use the tokenizer to count
client = anthropic.Anthropic()
# Count message tokens
total = sum(
len(msg["content"]) // 4 # ~4 chars per token approximation
for msg in messages
if isinstance(msg.get("content"), str)
)
# Count tool definition tokens
tool_tokens = sum(len(str(t)) // 4 for t in tools)
usage_pct = (total + tool_tokens) / 200000 * 100
return {
"estimated_tokens": total + tool_tokens,
"usage_pct": round(usage_pct, 1),
"warning": usage_pct > 60,
"critical": usage_pct > 80
}
Key Takeaways
- Context window = everything Claude sees — system prompt, history, tool results, message
- Lost-in-the-middle — instructions buried in large contexts get deprioritized
- Tool results fill fastest — trim aggressively before appending
- Stay under 60% as working limit — not 100%
- Monitor usage proactively — not reactively after production issues