AI Engineering

Navigating Claude Code: The Context Window Tax and How to Stop Paying It

Praveen JhaMay 17, 202611 min read

Quick Answer

Claude Code's context window fills with tool definitions, MCP server listings, file contents, and conversation history — all of which persist across every message in a session. The "context window tax" is the compounding cost of this accumulation: past 50% context usage, output quality degrades and token costs per response increase. Key management tactics: use /context to monitor usage, run /compact at 60% (not 95%), use /clear between unrelated tasks, prefer CLI tools over MCP servers, and write specific prompts that avoid broad file scanning.

The news angle: Anthropic removed the long-context pricing premium for Claude Opus 4.6 and Sonnet 4.6 in 2026 — the 1M context window is now GA at standard rates. At the same time, detailed guides on managing Claude Code token costs have become some of the most-read developer content this year. The pattern: teams are using Claude Code heavily, burning through quota fast, and discovering that cost is only half the problem. Quality degrades as context fills.

What changed: Claude Code has matured from a novelty to infrastructure. Teams running it daily are now optimizing it the same way they optimize database queries — systematically, with instrumentation and discipline.

Why it matters: A Claude Code session at 80% context does not produce the same quality output as the same session at 30% context. This is not a pricing complaint — it is an engineering reality. Managing the context window is not about being cheap; it is about maintaining output reliability.

What Is in Your Context Window Right Now

Run /context in any Claude Code session and you will see a breakdown like this:

Context Window Usage: 47%

├── System prompt:         8,200 tokens  (4%)
├── Tool definitions:     18,400 tokens  (9%)
├── MCP servers:          22,100 tokens (11%)
├── Memory files:          4,800 tokens  (2%)
└── Conversation history: 85,600 tokens (43%)

Total: 139,100 / 200,000 tokens

Most people look at the conversation history number and think "that is the problem." It is not. The hidden cost is MCP servers and tool definitions — items that load on every message, before you type a single character, and that many users never think about.

Six MCP servers with 15–20 tools each: 22,000+ tokens of overhead, per message, whether you use any of those tools or not.

The 50% Quality Threshold

Claude processes context sequentially. When context is sparse (under 50%), the model gives roughly equal attention to all context. When context fills past 50%, the model begins compressing its internal representation of older context — earlier conversation history receives less attention weight.

What degrades past 50%:

Earlier architectural decisions established in the session get deprioritized
Code patterns you specified early in the session drift in later-generated code
The model starts "forgetting" constraints you set at the beginning
Responses become less consistent with your established coding style

What does not degrade: The system prompt, tool definitions, and recent messages stay well-weighted. The casualty is the middle of long sessions.

The Three Commands You Need to Know

/context — Monitor Before It Is Too Late

/context

Run this after every 5–6 messages in a complex session. It takes 2 seconds and tells you exactly where you are. The number to watch: 50%. When you cross it, you have entered the degradation zone.

The actionable breakpoints:

<30%: Normal. No action needed.
30–50%: Healthy. Finish the current subtask cleanly.
50–70%: Start planning a compact. Do not start a new complex subtask.
>70%: Run /compact at the next natural breakpoint. Quality is degrading.
>90%: Auto-compact will trigger. You lost control of the summary quality.

/compact — Compress at the Right Moment

/compact Focus on the API design decisions, the authentication approach we agreed on, and the file structure we established

The secret: guide the summary. Without guidance, /compact produces a generic summary that may omit the specific decisions you need to preserve. With guidance, it front-weights the information that matters for continuing the work.

When to run it: At natural subtask boundaries. After you finish extracting a service class, before you start writing tests. After you establish the database schema, before you start writing the API layer. Never mid-task — the summary will be incoherent.

What it preserves by default: Recent code changes, key architectural decisions, error patterns encountered. What it may drop: verbose tool outputs, intermediate reasoning traces, redundant file readings.

/clear — The Fresh Start

/clear

Different task, different context. Stale context from a morning spent on the authentication module wastes tokens when you switch to working on the reporting module. The previous context adds noise to every message you send.

Rule of thumb: If you would not brief a new team member on the previous task context before asking them to start the new task, run /clear.

The 10 Habits That Cut Context Costs 60%

1. Disable MCP servers you are not using

// .claude/settings.json — per-project MCP config
{
  "mcpServers": {
    // Only enable what this project needs
    "postgres": { "command": "..." },
    // NOT loading Salesforce, GitHub, Slack MCP here
    // They cost 5,000–8,000 tokens each, every message
  }
}

2. Use CLI tools instead of MCP tools when possible

# Instead of MCP GitHub tool (adds ~6,000 tokens to every message):
gh pr view 123

# Instead of MCP AWS tool:
aws s3 ls s3://my-bucket --recursive

# CLI tools don't add to context overhead; MCP tools do

3. Write specific prompts — not exploratory ones

# Expensive (triggers broad file scanning):
"Review the codebase and find anything that could be improved"

# Efficient (targeted, reads minimal files):
"Add input validation to the createUser function in src/services/user.service.ts —
validate email format, check name is non-empty, ensure age is between 18 and 120"

4. Use the right model for the task

# In CLAUDE.md or per-session — use cheaper models for simple tasks
# Claude Code lets you specify model per session:
claude --model claude-sonnet-4-6  # 10x cheaper than Opus for simple tasks
claude --model claude-haiku-4-5   # Fastest for formatting, simple generation
claude --model claude-opus-4-7    # Reserve for hard architectural work

5. Keep CLAUDE.md focused

Every line in CLAUDE.md loads into every session. A 500-line CLAUDE.md is 4,000+ tokens of overhead per session. Prune it to the 80–100 lines that are actually consulted regularly:

# CLAUDE.md — keep under 100 lines
## Stack
- React 18 + TypeScript + Tailwind + shadcn/ui
- Supabase (PostgreSQL + Auth + Storage)
- Vite + Vitest + Playwright

## Coding Standards
- No "any" types — explicit TypeScript everywhere
- Prefer const → arrow functions for utilities
- Component files: default export at bottom
- Test coverage required for any new service function

## Do Not Do
- Do not add console.log to production code
- Do not use class components
- Do not modify .env files

6. Compact proactively at 60%, not reactively at 95%

At 60%, there is enough context headroom for a quality summary. At 95%, auto-compact runs and the summary is often compressed to the point of losing detail. Set a reminder or check /context regularly.

7. Use sub-tasks with fresh agents for parallel work

# Instead of one long session:
# Session 1 (auth service): /clear between auth and payment work
claude "Extract the auth service from src/services/main.service.ts"

# New session for payment work — no auth context bleeding in:
claude "Add Stripe webhook handling to src/services/payment.service.ts"

8. Checkpoint after major changes

After Claude completes a significant change (a service extraction, a major refactor), read the diff and summarize it yourself in the conversation:

"Good — we've extracted the CustomerOrderService with interface.
The key decisions: we kept @Transactional on the impl class,
used constructor injection, and defined the checked exception
as CustomerOrderException. Now let's add tests."

This explicit summary becomes a quality anchor in the context — it survives compression better than the implicit history of tool calls that led there.

9. Turn off extended thinking for non-reasoning tasks

/config thinking off   # Saves 15,000–40,000 tokens per response
                       # for tasks that don't need multi-step reasoning

Code formatting, simple refactors, documentation generation — these do not benefit from extended thinking. Disable it for these tasks.

10. Review your settings.json token budget

{
  "tokenBudget": {
    "maxTokensPerSession": 500000,   // Hard stop per session
    "warningThresholdPercent": 60,    // Alert at 60% context
    "autoCompactAtPercent": 75        // Auto-compact before quality degrades
  }
}

The Business Case for Context Discipline

For a team of 5 developers using Claude Code 6 hours/day:

Behavior	Daily tokens	Monthly cost (est.)
Undisciplined (MCP overload, exploratory prompts, no compacting)	~15M tokens/dev	$2,250/dev
Disciplined (targeted prompts, right model, proactive compact)	~6M tokens/dev	$900/dev
Saving	60% reduction	$8,250/month for 5 devs

The techniques above reliably produce 50–65% cost reduction without any sacrifice in task completion quality — because you are eliminating waste, not capability.

Context discipline is also the difference between a Claude Code session that maintains consistent code quality for 3 hours and one that drifts into generic, inconsistent output after 45 minutes.

Ortem Technologies uses Claude Code for AI agent development and custom software delivery — with context management discipline embedded in our engineering workflow. Our teams run 40–60% faster than traditional development with costs well within budget. Talk to our engineering team → | AI development services → | View how we work →

About Ortem Technologies

Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.

📬

Get the Ortem Tech Digest

Monthly insights on AI, mobile, and software strategy - straight to your inbox. No spam, ever.

Claude Code context windowClaude Code token management 2026Claude Code costs/compact commandClaude Code tips 2026context window taxClaude Code optimizationAI coding costs

Sources & References

1.Manage Costs Effectively — Claude Code Docs - Anthropic
2.Claude Code Context Window Management - ClaudeFast
3.23 Tips for Smart Claude Code Token Saving - Analytics Vidhya
4.Reduce Claude Code Costs 60% - SystemPrompt.io

About the Author

Praveen Jha

Director – AI Product Strategy, Development, Sales & Business Development, Ortem Technologies

Praveen Jha is the Director of AI Product Strategy, Development, Sales & Business Development at Ortem Technologies. With deep expertise in technology consulting and enterprise sales, he helps businesses identify the right digital transformation strategies - from mobile and AI solutions to cloud-native platforms. He writes about technology adoption, business growth, and building software partnerships that deliver real ROI.

Business DevelopmentTechnology ConsultingDigital Transformation

Frequently Asked Questions

: The context window tax is the accumulating cost of everything that loads into Claude Code's context on every message: system prompt, tool definitions (each tool Claude has access to), MCP server listings, memory files (CLAUDE.md), conversation history, and file contents Claude has read. These all persist across the session. A few MCP servers can consume 10–15% of your context before you type a single message. By the time you are 8 messages deep in a complex coding session, you may be at 60–70% context usage — paying for all that accumulated context on every subsequent message, and receiving degraded output quality.
: /compact triggers a summarization of your conversation history, replacing the full conversation with a compressed summary that preserves the most important information — architecture decisions, code patterns established, key findings. It is different from /clear (which completely wipes history). /compact keeps context but reduces its token footprint. Best practice: run /compact at 60% context usage, not 95%. Running it proactively at a natural breakpoint (after completing a subtask) keeps the summary coherent. Running it reactively at 95% often produces poor summaries because there is too little context headroom to work with.
: Type /context in a Claude Code session. It shows the current percentage of context window consumed, broken down by: system prompt, tool definitions, memory files (CLAUDE.md), MCP server listings, and conversation history. Pay attention to the tool and MCP sections — these are often the largest unexpected consumers. If you have 6 MCP servers configured, their tool listings may consume 15–20% of context before you type anything. The 50% mark is the quality threshold: past 50%, expect response quality to decline as the model begins to deprioritize older context.
: Every MCP server you configure loads its full tool listing into Claude's context on every message — even if you never use those tools in that session. A single MCP server with 20 tools might add 3,000–5,000 tokens to every message. Six MCP servers means 18,000–30,000 tokens of overhead per message, before any actual work happens. This is the single largest source of unexpected context consumption for most Claude Code users. Solution: configure only the MCP servers you need for the current task. Use Claude Code's project-level settings to enable different MCP servers per project rather than loading everything globally.
: Use /clear when: switching to a completely unrelated task (the previous context is irrelevant and you want a completely fresh start). Stale context from a previous task wastes tokens on every subsequent message. Use /compact when: you want to continue the current task but the conversation history has grown large. /compact preserves the key findings and decisions while reducing the token footprint. You can guide what /compact preserves: "/compact Focus on the API design decisions and the authentication approach we settled on" — Claude will weight those topics in the summary.

Stay Ahead

Get engineering insights in your inbox

Practical guides on software development, AI, and cloud. No fluff — published when it's worth your time.

Ready to Start Your Project?

Let Ortem Technologies help you build innovative solutions for your business.

AI Engineering

How to Build a Production-Ready AI Agent with LangGraph in 2026

16 min readMay 15, 2026

AI Engineering

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: Which AI Model Should You Build With in 2026?

13 min readMay 9, 2026

AI Engineering

Vibe Coding in 2026: What It Is, What It Costs You, and When to Use It

12 min readMay 9, 2026

Navigating Claude Code: The Context Window Tax and How to Stop Paying It

What Is in Your Context Window Right Now

The 50% Quality Threshold

The Three Commands You Need to Know

/context — Monitor Before It Is Too Late

/compact — Compress at the Right Moment

/clear — The Fresh Start

The 10 Habits That Cut Context Costs 60%

1. Disable MCP servers you are not using

2. Use CLI tools instead of MCP tools when possible

3. Write specific prompts — not exploratory ones

4. Use the right model for the task

5. Keep CLAUDE.md focused

6. Compact proactively at 60%, not reactively at 95%

7. Use sub-tasks with fresh agents for parallel work

8. Checkpoint after major changes

9. Turn off extended thinking for non-reasoning tasks

10. Review your settings.json token budget

The Business Case for Context Discipline

About Ortem Technologies

Get the Ortem Tech Digest

Frequently Asked Questions

Get engineering insights in your inbox

Ready to Start Your Project?

You Might Also Like

How to Build a Production-Ready AI Agent with LangGraph in 2026

GPT-5.5 vs Claude Opus 4.7 vs Gemini 3.1 Pro: Which AI Model Should You Build With in 2026?

Vibe Coding in 2026: What It Is, What It Costs You, and When to Use It