Claude Code Context Management Guide

Claude Code sessions that run too long silently degrade in quality, cost, and speed. Context management is the difference between productive multi-hour workflows and expensive, unreliable babysitting. Most developers working with Claude Code through the CLI or VS Code don't monitor their context window until outputs start contradicting earlier decisions or hallucinating file paths that don't exist. This article provides a systematic framework for diagnosing, preventing, and recovering from context saturation, drawing on the specific mechanisms Claude Code exposes for controlling session state.

The Hidden Cost of Long-Running Claude Code Sessions
Monitoring Your Context Usage
The /compact Command: Your Primary Context Tool
The /clear Command: When to Start Fresh
Session Persistence with CLAUDE.md
Memory Optimization for Large Codebases
Session Management Cheat Sheet
Putting It All Together: A Real Workflow
Key Takeaways

The Hidden Cost of Long-Running Claude Code Sessions

What Happens at 200k Tokens

Claude Code operates against the context window of whatever model it's running. Claude 3.5 Sonnet and Claude 3 Opus currently offer 200,000-token context windows (verify model names and limits against current Anthropic documentation for your API tier and version). The system prompt, conversation history, file contents, and tool results all share that 200k budget.

What researchers call "Lost in the Middle" syndrome drives the critical failure mode. Research including Liu et al. (2023) "Lost in the Middle" (arxiv.org/abs/2307.03172) demonstrates that accuracy drops for middle-positioned information in retrieval tasks; analogous effects appear in Claude Code sessions, though no published study has measured them independently. In practice, this means the model loses track of architectural decisions made 30 minutes ago while perfectly recalling the file you just asked it to open.

You'll recognize the symptoms. Claude generates code that duplicates functions already written earlier in the session. It forgets coding conventions established at the start. It suggests one approach after already implementing another. It hallucinates file paths that don't exist in the project. These aren't random failures. They follow predictably from context window saturation and uneven attention distribution across long token sequences.

Why "Just Keep Going" Is the Most Expensive Strategy

Every turn in a Claude Code session reprices the entire context. The model processes all accumulated input tokens on each interaction. To put a number on it: at Sonnet's published input price of $3 per million tokens, a single turn at 150k context costs roughly $0.45 in input alone, versus $0.09 at 30k tokens, a 5x difference for the same one-line prompt. This creates a compounding cost curve where the last hour of a marathon session can dwarf the first several hours combined (actual ratios depend on cache hit rates and output length, but the scaling is real).

Every turn in a Claude Code session reprices the entire context. The model processes all accumulated input tokens on each interaction.

Latency tracks the same curve. We haven't benchmarked this precisely, but expect per-turn latency to scale with input size; a turn that feels instant at 20k tokens will drag noticeably at 150k. Beyond cost and speed, there's the operational problem: sessions with fragile, bloated state require constant human supervision. You can't walk away and come back, because the model's grasp on project state becomes unreliable enough that unsupervised work produces errors that take longer to fix than to have done manually.

Monitoring Your Context Usage

Checking Token Count Mid-Session

Claude Code surfaces context usage directly. Run /cost for a snapshot of the current session's token consumption and associated costs.

Note: Output format below is illustrative. Actual field names and available metrics depend on your installed CLI version. Run /cost in your environment to confirm current output format.

> /cost

 Session Cost Summary
 ─────────────────────────────────
 Input tokens:    87,432
 Output tokens:   12,891
 Cache read:      61,204
 Cache write:     26,228
 Total cost:      $1.47
 Context window:  43.7% utilized

Context utilization percentage is the most actionable number. It shows how much of the effective 200k window is currently occupied. Input token count reflects cumulative conversation history plus any file contents loaded into context. Cache read and cache write lines indicate how much of the input Anthropic's prompt caching served versus freshly processed, which directly affects cost. (Prompt caching applies when repeated content, such as CLAUDE.md or system prompts, exceeds minimum token thresholds; see Anthropic caching documentation for eligibility details.)

Recognizing When Context Degrades

Beyond the numbers, behavioral signals flag problems before metrics hit critical thresholds. When Claude starts asking for information already provided earlier in the session, that's a retrieval failure from the middle of the context. When generated code conflicts with earlier architectural decisions, or when the model loses track of the file structure it was navigating ten minutes ago, context saturation is the likely culprit.

A practical heuristic: intervene at 60% context utilization. This threshold is author preference based on observed quality dropoff, not an officially documented number, and your mileage will vary depending on how much of your context is code versus conversation. Around 80-90% utilization, auto-compaction triggers (exact threshold varies by CLI version; consult release notes), but auto-compaction operates without specific guidance about what to preserve, making it less precise than manual intervention.

The /compact Command: Your Primary Context Tool

How /compact Works Under the Hood

Running /compact triggers a summarization pass. Claude reads the full conversation history, produces a condensed summary, and replaces the existing conversation with that summary as the new starting context. This frees up token space while preserving the essential thread of the session. Anthropic bills this call against the full current context, so factor compaction cost into your session economics.

Manual versus automatic compaction matters here. Auto-compaction fires when context approaches the window limit, but at that point the model is already operating in a degraded state, and the summarization itself happens under constrained conditions. Manual compaction, invoked proactively, runs while there's still headroom. The model still has clear recall of the full conversation, so it produces better summaries.

The summary keeps high-level decisions, recent code changes, and the general trajectory of the work. It drops verbatim earlier exchanges, nuanced reasoning chains, and specific phrasing from earlier prompts. Summarization is lossy and non-deterministic, varying between invocations. Treat it accordingly.

Using Custom Compact Prompts for Precision

An underused capability: /compact accepts a custom prompt that directs the summarization focus. This turns generic compression into a targeted context curation step. (Syntax shown is illustrative; verify custom prompt support and syntax in your installed CLI version.)

# Custom compact prompts — verified safe characters only.
# If your prompt contains quotes or special characters, test interactively first.

/compact Focus on the database schema changes and migration plan. Discard UI discussion.

/compact Retain all API endpoint signatures and auth flow decisions. Summarize everything else.

/compact We are starting the testing phase. Keep only implementation details relevant to writing tests.

# Note: Do not use shell special characters (quotes, ampersands, pipes) in compact
# prompts until you have verified your CLI version handles them correctly.
# Run /compact --help or consult release notes for argument parsing behavior.

Custom prompts are most valuable at phase transitions: finishing a refactor and moving to testing, completing one module and starting another, or shifting from design discussion to implementation. They let you explicitly control what survives the compression, rather than relying on the model's generic judgment about what matters.

Strategic Compaction Timing

Compact after completing a logical unit of work, not in the middle of an active implementation. Mid-task compaction risks losing the detailed reasoning thread that the model needs to finish the current piece of work.

Target 60% context utilization, well before the ~80% threshold where auto-compact activates. As a time-based heuristic, every 30 to 45 minutes of active work or after every major milestone (completing a function, finishing a module, resolving a bug) is a reasonable cadence. Developers who compact proactively report more consistent output quality across long work sessions compared to those who let auto-compaction handle everything.

The /clear Command: When to Start Fresh

/clear vs. /compact: Choosing the Right Reset

Warning: /clear permanently deletes all session history with no recovery option. Save any needed context to CLAUDE.md before running this command.

Running /clear wipes the conversation entirely. No summary, no preserved history. It's a hard reset to a blank session (though CLAUDE.md files still load automatically).

Choosing between /clear and /compact comes down to whether prior context has positive or negative value:

# /compact vs /clear decision guide
# (Pseudocode — not valid shell. Do not execute.)
#
# IF no prior context needed:
#     use: /clear
#
# ELSE IF prior context is mostly relevant:
#     use: /compact
#
# ELSE (some context valuable, some is noise):
#     use: /compact <focused prompt>
#          example: /compact Keep only auth flow decisions. Discard UI discussion.

Use /clear when the context is poisoned with incorrect assumptions the model keeps reverting to, when the task pivot is total (switching from backend work to an unrelated frontend feature), or when starting a genuinely new workstream. Use /compact when continuity matters and prior decisions need to survive. The hybrid path, /compact with a targeted custom prompt, handles the common middle case where some context is valuable and some is noise.

Session Persistence with CLAUDE.md

What Is CLAUDE.md and Why It Matters for Long Sessions

Claude Code automatically loads a markdown file called CLAUDE.md into context at the start of every session. CLAUDE.md at the repository root loads when Claude Code is invoked from within that directory. Verify loading behavior for your specific invocation mode (CLI, VS Code extension, headless API). It functions as durable memory that survives /clear, survives session termination, and survives machine restarts. This is the mechanism that makes multi-session workflows viable.

Three tiers exist: a project-level CLAUDE.md at the repository root, subdirectory-level files for module-specific context, and a user-level file at ~/.claude/CLAUDE.md for personal preferences that apply across all projects. (Confirm current path and loading hierarchy in official Claude Code documentation; paths may differ by OS or CLI version.) Each tier loads automatically when relevant.

Structuring CLAUDE.md for Context Efficiency

Every line in CLAUDE.md costs tokens on every API call in the session, since it's injected into the system prompt and stays there. Conciseness is not optional.

# Project: Inventory API

Express.js REST API with PostgreSQL. TypeScript strict mode. Node 20 LTS.

## Conventions
- Use Zod for request validation, never manual checks
- Repository pattern for all DB access (see src/repos/)
- Error responses follow RFC 7807 Problem Details format
- Tests use Vitest, not Jest. Run: `npm test`

## Architecture Decisions
- Auth: JWT with refresh token rotation (src/middleware/auth.ts)
- DB migrations: Drizzle ORM, migration files in drizzle/
- Rate limiting at API gateway level, not in application code

## Active Task Context

> **Maintenance note (for human readers):** Remove completed items from this
> section at end of each session to avoid stale context accumulating in the
> system prompt. This block is injected verbatim into every API call.

- CURRENT: Building /inventory/transfer endpoint
- COMPLETED: Schema for transfer_logs table, migration applied
- BLOCKED: Waiting on decision about idempotency key strategy
- TODO: Write integration tests for transfer flow

## Useful Commands

> Verify these script names exist in package.json before committing this file.
> Run `npm run` (no args) to list all available scripts for this project.

- `npm run dev` — start with hot reload (verify script name in package.json)
- `npm run db:migrate` — run pending migrations (verify; may be `drizzle-kit migrate`)
- `npm run typecheck` — type-check without emit (verify; may be `tsc --noEmit` directly)
- `npm test` — run test suite (Vitest)

Avoid dumping entire READMEs, API specs, or design documents into CLAUDE.md. That burns context budget on every turn for information that's only occasionally relevant. Stick to conventions, active state, and high-signal architectural decisions.

The "Session Handoff" Pattern

For multi-session work, the explicit handoff is the strongest pattern: before ending a session, ask Claude to persist the current state to CLAUDE.md. Note that updating CLAUDE.md with session state means the file will carry those assumptions into future sessions. Review it periodically to remove stale or incorrect entries.

# End-of-session ritual
# ─────────────────────────────────────────────────────────────────
# Step 1: Run inside Claude Code CLI (interactive prompt, NOT a shell command)
#
#   > Update CLAUDE.md with our current progress, outstanding TODOs,
#     and key decisions made today.
#
# Step 2: Verify CLAUDE.md was updated before clearing:
#   (In your shell)
git diff CLAUDE.md
#   Confirm the diff shows expected new content. If empty, re-run Step 1.
#
# Step 3: Only after confirming CLAUDE.md is updated, run inside Claude Code CLI:
#
#   > /clear
#
# ─────────────────────────────────────────────────────────────────
# Next session — inside Claude Code CLI:
#
#   > Read CLAUDE.md and summarize where we left off.

This creates a clean loop: work accumulates in the session, gets persisted to the file, the session resets, and the next session picks up from the file. CLAUDE.md becomes a living project log that's both human-readable and machine-consumable.

This creates a clean loop: work accumulates in the session, gets persisted to the file, the session resets, and the next session picks up from the file.

Memory Optimization for Large Codebases

Controlling What Claude Reads with .claudeignore

The .claudeignore file (verify availability in your CLI version via claude --help) works similarly to .gitignore to prevent Claude from reading specified paths. For projects with more than 500 files or large amounts of generated code, .claudeignore prevents Claude from reading thousands of irrelevant files during exploration, burning context tokens on dependency code and build output.

# .claudeignore
# Directories (trailing slash = directory only, consistent with .gitignore)
node_modules/
dist/
build/
coverage/
.next/
__pycache__/

# File patterns
*.log
*.map

# Generated files — adjust pattern to match your project's naming convention.
# Double-extension globs (*.generated.*) are non-standard; prefer explicit extensions:
*.generated.ts
*.generated.js
*.pb.go
# Or use a generated/ subdirectory and ignore it:
# generated/

This matters most for monorepos, projects with large vendor directories, generated code, and build artifacts.

Scoping Work to Reduce Context Burn

Beyond file-level filtering, prompt discipline matters. Use focused prompts that reference specific files rather than open-ended requests like "fix everything in the auth module." Narrow file scopes prevent Claude from pulling unnecessary content into context during exploration.

The "Divide and Conquer" Session Strategy

Break features into discrete sessions rather than marathoning through an entire feature in one sitting.

Start with architecture and interfaces in your first session. Define types, plan module boundaries, and document decisions in CLAUDE.md. Then move to implementing module A in a second session, starting from CLAUDE.md and working within a focused scope before handing off again.

Your third session tackles module B following the same pattern, with independent context. Finally, bring it together: a fourth session for integration and testing, where CLAUDE.md carries forward all interface contracts and decisions from prior sessions.

Each session starts clean, with CLAUDE.md providing continuity. No session accumulates enough context to trigger quality loss.

Session Management Cheat Sheet

Scenario	Command	When to Use	What's Preserved
Routine maintenance	`/compact`	Every 30-45 min	Summarized history
Focused continuation	`/compact [prompt]`	Task phase change	Directed summary
Total reset	`/clear`	Poisoned context	Nothing from session history; only content previously written to CLAUDE.md persists
Session end	Handoff to CLAUDE.md	End of workday	Persistent file state
New session start	Read CLAUDE.md	Session open	Project + task state

Quick-reference rules: compact at 60% context, not 90%. Limit sessions to work expected to consume under 120k input tokens. Always handoff before closing.

Token consumption grows cumulatively: each turn's input includes all prior history. A rough ceiling estimate:

# Token ceiling estimate (rough):
#
#   remaining_capacity = context_window_limit - current_context_tokens
#
#   Example: 200,000 - 87,432 = 112,568 tokens remaining
#
#   To estimate turns remaining at current average input size:
#   turns_remaining ≈ remaining_capacity / avg_tokens_per_turn
#
#   Use /cost readings rather than manual calculation for precision.
#   Actual cost depends on output length and cache hit rate.

If remaining capacity is low, plan compaction intervals accordingly. Proactive compaction at regular intervals keeps each turn's input token cost lower, reducing cumulative input-token spend over a full workday compared to a single unmanaged marathon session.

Putting It All Together: A Real Workflow

Example: Building an API Feature Across Multiple Sessions

Consider building a /inventory/transfer REST endpoint across multiple sessions.

Session 1 (Design and scaffold): Define the endpoint contract, create the route stub, design the database schema, and run the migration. Compact once after the schema is finalized. Handoff to CLAUDE.md with the schema, endpoint signature, and outstanding implementation tasks.

Session 2 (Implementation): Start from CLAUDE.md, implement the transfer logic and repository layer. Compact with /compact Retain the transfer endpoint implementation details and validation rules. Discard schema design discussion. after finishing the core logic. Compact again with a testing-focused prompt before handoff. Update CLAUDE.md with implementation status.

Session 3 (Tests and cleanup): Start fresh from CLAUDE.md, which now contains the full implementation context. Write integration tests, handle edge cases, clean up. No compaction needed if the session stays short and focused. Final CLAUDE.md update marks the feature complete.

Each technique from this article slots into a specific point in the workflow. Target no single session exceeding 60% context utilization. Total cost is lower than a single marathon session would have been, and output quality stays consistent throughout.

Each session starts clean, with CLAUDE.md providing continuity. No session accumulates enough context to trigger quality loss.

Key Takeaways

Context management is the skill gap between Claude Code beginners and power users. Proactive compaction at 60% utilization with custom prompts, CLAUDE.md handoffs between sessions, and scoped sessions that never try to do everything at once. The cheat sheet and estimation guidance above serve as ongoing references for building these habits into daily practice.

Context Management for Long-Running Claude Code Sessions

Table of Contents