Ortem Technologies
    AI Engineering

    Navigating Claude Code: The Context Window Tax and How to Stop Paying It

    Praveen JhaMay 17, 202611 min read
    Navigating Claude Code: The Context Window Tax and How to Stop Paying It
    Quick Answer

    Claude Code's context window fills with tool definitions, MCP server listings, file contents, and conversation history — all of which persist across every message in a session. The "context window tax" is the compounding cost of this accumulation: past 50% context usage, output quality degrades and token costs per response increase. Key management tactics: use /context to monitor usage, run /compact at 60% (not 95%), use /clear between unrelated tasks, prefer CLI tools over MCP servers, and write specific prompts that avoid broad file scanning.

    The news angle: Anthropic removed the long-context pricing premium for Claude Opus 4.6 and Sonnet 4.6 in 2026 — the 1M context window is now GA at standard rates. At the same time, detailed guides on managing Claude Code token costs have become some of the most-read developer content this year. The pattern: teams are using Claude Code heavily, burning through quota fast, and discovering that cost is only half the problem. Quality degrades as context fills.

    What changed: Claude Code has matured from a novelty to infrastructure. Teams running it daily are now optimizing it the same way they optimize database queries — systematically, with instrumentation and discipline.

    Why it matters: A Claude Code session at 80% context does not produce the same quality output as the same session at 30% context. This is not a pricing complaint — it is an engineering reality. Managing the context window is not about being cheap; it is about maintaining output reliability.

    What Is in Your Context Window Right Now

    Run /context in any Claude Code session and you will see a breakdown like this:

    Context Window Usage: 47%
    
    ├── System prompt:         8,200 tokens  (4%)
    ├── Tool definitions:     18,400 tokens  (9%)
    ├── MCP servers:          22,100 tokens (11%)
    ├── Memory files:          4,800 tokens  (2%)
    └── Conversation history: 85,600 tokens (43%)
    
    Total: 139,100 / 200,000 tokens
    

    Most people look at the conversation history number and think "that is the problem." It is not. The hidden cost is MCP servers and tool definitions — items that load on every message, before you type a single character, and that many users never think about.

    Six MCP servers with 15–20 tools each: 22,000+ tokens of overhead, per message, whether you use any of those tools or not.

    The 50% Quality Threshold

    Claude processes context sequentially. When context is sparse (under 50%), the model gives roughly equal attention to all context. When context fills past 50%, the model begins compressing its internal representation of older context — earlier conversation history receives less attention weight.

    What degrades past 50%:

    • Earlier architectural decisions established in the session get deprioritized
    • Code patterns you specified early in the session drift in later-generated code
    • The model starts "forgetting" constraints you set at the beginning
    • Responses become less consistent with your established coding style

    What does not degrade: The system prompt, tool definitions, and recent messages stay well-weighted. The casualty is the middle of long sessions.

    The Three Commands You Need to Know

    /context — Monitor Before It Is Too Late

    /context
    

    Run this after every 5–6 messages in a complex session. It takes 2 seconds and tells you exactly where you are. The number to watch: 50%. When you cross it, you have entered the degradation zone.

    The actionable breakpoints:

    • <30%: Normal. No action needed.
    • 30–50%: Healthy. Finish the current subtask cleanly.
    • 50–70%: Start planning a compact. Do not start a new complex subtask.
    • >70%: Run /compact at the next natural breakpoint. Quality is degrading.
    • >90%: Auto-compact will trigger. You lost control of the summary quality.

    /compact — Compress at the Right Moment

    /compact Focus on the API design decisions, the authentication approach we agreed on, and the file structure we established
    

    The secret: guide the summary. Without guidance, /compact produces a generic summary that may omit the specific decisions you need to preserve. With guidance, it front-weights the information that matters for continuing the work.

    When to run it: At natural subtask boundaries. After you finish extracting a service class, before you start writing tests. After you establish the database schema, before you start writing the API layer. Never mid-task — the summary will be incoherent.

    What it preserves by default: Recent code changes, key architectural decisions, error patterns encountered. What it may drop: verbose tool outputs, intermediate reasoning traces, redundant file readings.

    /clear — The Fresh Start

    /clear
    

    Different task, different context. Stale context from a morning spent on the authentication module wastes tokens when you switch to working on the reporting module. The previous context adds noise to every message you send.

    Rule of thumb: If you would not brief a new team member on the previous task context before asking them to start the new task, run /clear.

    The 10 Habits That Cut Context Costs 60%

    1. Disable MCP servers you are not using

    // .claude/settings.json — per-project MCP config
    {
      "mcpServers": {
        // Only enable what this project needs
        "postgres": { "command": "..." },
        // NOT loading Salesforce, GitHub, Slack MCP here
        // They cost 5,000–8,000 tokens each, every message
      }
    }
    

    2. Use CLI tools instead of MCP tools when possible

    # Instead of MCP GitHub tool (adds ~6,000 tokens to every message):
    gh pr view 123
    
    # Instead of MCP AWS tool:
    aws s3 ls s3://my-bucket --recursive
    
    # CLI tools don't add to context overhead; MCP tools do
    

    3. Write specific prompts — not exploratory ones

    # Expensive (triggers broad file scanning):
    "Review the codebase and find anything that could be improved"
    
    # Efficient (targeted, reads minimal files):
    "Add input validation to the createUser function in src/services/user.service.ts —
    validate email format, check name is non-empty, ensure age is between 18 and 120"
    

    4. Use the right model for the task

    # In CLAUDE.md or per-session — use cheaper models for simple tasks
    # Claude Code lets you specify model per session:
    claude --model claude-sonnet-4-6  # 10x cheaper than Opus for simple tasks
    claude --model claude-haiku-4-5   # Fastest for formatting, simple generation
    claude --model claude-opus-4-7    # Reserve for hard architectural work
    

    5. Keep CLAUDE.md focused

    Every line in CLAUDE.md loads into every session. A 500-line CLAUDE.md is 4,000+ tokens of overhead per session. Prune it to the 80–100 lines that are actually consulted regularly:

    # CLAUDE.md — keep under 100 lines
    ## Stack
    - React 18 + TypeScript + Tailwind + shadcn/ui
    - Supabase (PostgreSQL + Auth + Storage)
    - Vite + Vitest + Playwright
    
    ## Coding Standards
    - No "any" types — explicit TypeScript everywhere
    - Prefer const → arrow functions for utilities
    - Component files: default export at bottom
    - Test coverage required for any new service function
    
    ## Do Not Do
    - Do not add console.log to production code
    - Do not use class components
    - Do not modify .env files
    

    6. Compact proactively at 60%, not reactively at 95%

    At 60%, there is enough context headroom for a quality summary. At 95%, auto-compact runs and the summary is often compressed to the point of losing detail. Set a reminder or check /context regularly.

    7. Use sub-tasks with fresh agents for parallel work

    # Instead of one long session:
    # Session 1 (auth service): /clear between auth and payment work
    claude "Extract the auth service from src/services/main.service.ts"
    
    # New session for payment work — no auth context bleeding in:
    claude "Add Stripe webhook handling to src/services/payment.service.ts"
    

    8. Checkpoint after major changes

    After Claude completes a significant change (a service extraction, a major refactor), read the diff and summarize it yourself in the conversation:

    "Good — we've extracted the CustomerOrderService with interface.
    The key decisions: we kept @Transactional on the impl class,
    used constructor injection, and defined the checked exception
    as CustomerOrderException. Now let's add tests."
    

    This explicit summary becomes a quality anchor in the context — it survives compression better than the implicit history of tool calls that led there.

    9. Turn off extended thinking for non-reasoning tasks

    /config thinking off   # Saves 15,000–40,000 tokens per response
                           # for tasks that don't need multi-step reasoning
    

    Code formatting, simple refactors, documentation generation — these do not benefit from extended thinking. Disable it for these tasks.

    10. Review your settings.json token budget

    {
      "tokenBudget": {
        "maxTokensPerSession": 500000,   // Hard stop per session
        "warningThresholdPercent": 60,    // Alert at 60% context
        "autoCompactAtPercent": 75        // Auto-compact before quality degrades
      }
    }
    

    The Business Case for Context Discipline

    For a team of 5 developers using Claude Code 6 hours/day:

    BehaviorDaily tokensMonthly cost (est.)
    Undisciplined (MCP overload, exploratory prompts, no compacting)~15M tokens/dev$2,250/dev
    Disciplined (targeted prompts, right model, proactive compact)~6M tokens/dev$900/dev
    Saving60% reduction$8,250/month for 5 devs

    The techniques above reliably produce 50–65% cost reduction without any sacrifice in task completion quality — because you are eliminating waste, not capability.

    Context discipline is also the difference between a Claude Code session that maintains consistent code quality for 3 hours and one that drifts into generic, inconsistent output after 45 minutes.


    Ortem Technologies uses Claude Code for AI agent development and custom software delivery — with context management discipline embedded in our engineering workflow. Our teams run 40–60% faster than traditional development with costs well within budget. Talk to our engineering team → | AI development services → | View how we work →

    About Ortem Technologies

    Ortem Technologies is a premier custom software, mobile app, and AI development company. We serve enterprise and startup clients across the USA, UK, Australia, Canada, and the Middle East. Our cross-industry expertise spans fintech, healthcare, and logistics, enabling us to deliver scalable, secure, and innovative digital solutions worldwide.

    📬

    Get the Ortem Tech Digest

    Monthly insights on AI, mobile, and software strategy - straight to your inbox. No spam, ever.

    Claude Code context windowClaude Code token management 2026Claude Code costs/compact commandClaude Code tips 2026context window taxClaude Code optimizationAI coding costs

    About the Author

    P
    Praveen Jha

    Director – AI Product Strategy, Development, Sales & Business Development, Ortem Technologies

    Praveen Jha is the Director of AI Product Strategy, Development, Sales & Business Development at Ortem Technologies. With deep expertise in technology consulting and enterprise sales, he helps businesses identify the right digital transformation strategies - from mobile and AI solutions to cloud-native platforms. He writes about technology adoption, business growth, and building software partnerships that deliver real ROI.

    Business DevelopmentTechnology ConsultingDigital Transformation
    LinkedIn

    Frequently Asked Questions

    Stay Ahead

    Get engineering insights in your inbox

    Practical guides on software development, AI, and cloud. No fluff — published when it's worth your time.

    Ready to Start Your Project?

    Let Ortem Technologies help you build innovative solutions for your business.