This metrics tool terrifies bad developers

Start free trial
SitePoint Premium
Stay Relevant and Grow Your Career in Tech
  • Premium Results
  • Publish articles on SitePoint
  • Daily curated jobs
  • Learning Paths
  • Discounts to dev tools
Start Free Trial

7 Day Free Trial. Cancel Anytime.

Claude Code 2.1 introduces two capabilities that reshape how senior developers handle complex refactoring, multi-file migrations, and production-grade code generation: the xhigh effort tier and a native auto-verification loop that operates as a first-class feature rather than an afterthought.

Important: This article describes anticipated features of Claude Code 2.1. All configuration keys, model identifiers, CLI flags, environment variables, and behavioral details described below are unverified against official Anthropic documentation at time of writing. Verify every setting against the official Anthropic docs, CLI help output, and release notes before use in development or CI/CD environments.

Table of Contents

What Changed in Claude Code 2.1 and Why xHigh Matters

The Effort Tier System Explained (low, medium, high, xhigh)

Claude Code's effort tier system maps directly to the depth of reasoning the model applies to any given task. The four tiers (low, medium, high, and xhigh) control how much chain-of-thought processing the model performs before producing output. Responses at low are fast and shallow, suitable for completions and simple lookups. Medium adds basic planning, and high introduces multi-step reasoning with moderate context use.

The xhigh tier unlocks what the lower tiers cannot: extended chain-of-thought that spans multiple reasoning passes, multi-pass planning where the model revisits and refines its approach before generating code, and deeper use of the available context window. For tasks involving inter-file dependency resolution, architectural reasoning across module boundaries, or migration logic that must account for cascading side effects, xhigh is often the only tier that produces reliable results on the first attempt.

For tasks involving inter-file dependency resolution, architectural reasoning across module boundaries, or migration logic that must account for cascading side effects, xhigh is often the only tier that produces reliable results on the first attempt.

That said, xhigh is overkill for boilerplate generation, simple CRUD operations, or any task where the correct output is structurally obvious. Applying it indiscriminately leads to unnecessary cost and slower response times with no meaningful quality improvement.

Auto-Verification as a First-Class Feature

In earlier Claude Code versions, verification required manual invocation and custom scripting to close the feedback loop. Claude Code 2.1 promotes auto-verification to a native workflow: generate code, run lint and type checks, execute tests, evaluate the output against pass criteria, then either self-correct and retry or confirm the result.

This verification loop architecture is built into the runtime rather than bolted on. Claude Code 2.1 uses heuristic detection of project tooling to select appropriate verification steps automatically, though developers can override this with explicit configuration. Verification can now be configured as a persistent project-level default rather than requiring per-invocation flags. Note that autoVerify defaults to false; you must explicitly enable it to activate the persistent verification loop.

The minimum viable configuration to activate both features:

{
  "effort": "xhigh",
  "autoVerify": true
}

This .claude/settings.json snippet enables xhigh reasoning and the auto-verification loop for every task in the project scope.

Configuring xHigh Effort Mode

Global vs. Project-Level Configuration

Note: The claude config set subcommand, all environment variables, and all CLI flags described in this section are unverified. Confirm availability via claude config --help and claude --help before use.

xHigh can be set at three levels depending on the desired scope. For global activation across all projects:

claude config set effort xhigh

For project-scoped overrides, the .claude/settings.json file in the project root takes precedence over global settings. For CI/CD pipelines where configuration files may not be present, the environment variable approach works:

export CLAUDE_CODE_EFFORT=xhigh

Project-level configuration is the recommended approach for teams, since it commits the effort tier alongside the codebase and ensures consistent behavior across developer machines.

Combining xHigh with Model Selection

xHigh produces its best results when pinned explicitly to a specific model. Relying on default model routing can result in a lower-tier model receiving the xhigh instruction, which produces different and often worse results. The model's reasoning architecture must support the extended chain-of-thought that xhigh demands; sending xhigh to a model that lacks that depth leads to longer processing times without corresponding quality gains.

Important: The model identifier used below is a placeholder. Before configuring, confirm the correct model identifier by calling the Anthropic models API (GET /v1/models) or consulting the official model list. Known Anthropic model identifiers typically follow a date-versioned format (e.g., claude-3-opus-20240229).

The full project configuration with model pinning:

{
  "model": "YOUR_MODEL_ID_HERE",
  "effort": "xhigh",
  "autoVerify": true,
  "verification": {
    "commands": [
      "npm run typecheck",
      "npm run test:affected",
      "eslint . --cache --cache-location .eslintcache"
    ],
    "maxRetries": 3
  }
}

For ad-hoc sessions from the command line:

claude --model YOUR_MODEL_ID_HERE --effort xhigh

For CI integration via environment variables (verify variable names against claude --help or official docs before CI use):

export CLAUDE_CODE_MODEL=YOUR_MODEL_ID_HERE
export CLAUDE_CODE_EFFORT=xhigh
export CLAUDE_CODE_AUTO_VERIFY=1

Session-Level Effort Overrides

Within an active session, developers can escalate effort for a specific complex task using the /effort xhigh slash command (confirm availability in your installed version), then drop back to high or medium for routine follow-ups. This avoids paying the xhigh token premium for every interaction while preserving access to deep reasoning when needed. Session-level overrides do not persist beyond the current session.

Implementing Auto-Verification Workflows

How the Verification Loop Works Internally

The auto-verification loop follows a fixed sequence of steps (individual outputs remain non-deterministic): code generation, lint and type-check execution, test execution against affected files, output evaluation against expected pass criteria, and finally either self-correction with retry or confirmation of the result.

You configure the maximum retry count with verification.maxRetries. When retries exhaust without all verification steps passing, Claude Code 2.1 bails out and presents the last attempt along with the failing verification output. This prevents infinite loops on fundamentally broken approaches.

Claude Code 2.1 selects which verification steps to run through heuristic detection of project tooling. It inspects package.json scripts, configuration files for common linters and type checkers, and test runner configurations (this heuristic behavior is unverified -- confirm against your installed version). When heuristics fall short, or when the project uses non-standard tooling, explicit configuration is necessary.

Defining Custom Verification Commands

Specify custom verification commands in .claude/settings.json under the verification.commands array. Each entry runs in sequence, and a failure in any step triggers the retry logic:

{
  "model": "YOUR_MODEL_ID_HERE",
  "effort": "xhigh",
  "autoVerify": true,
  "verification": {
    "commands": [
      "npm run typecheck",
      "npm run test:affected",
      "eslint . --cache --cache-location .eslintcache"
    ],
    "maxRetries": 3
  }
}

Warning: Do not use eslint --fix as a verification command. The --fix flag mutates source files on disk, meaning every verification retry will silently modify your code, corrupting change history and producing unreliable test results. Use eslint . --cache --cache-location .eslintcache (read-only, exits non-zero on errors) for verification. If auto-fix is desired, run it as a separate, explicit step outside the verification loop.

Note: npm run test:affected assumes your project has a corresponding script defined in package.json (e.g., via Nx, Turborepo, or Jest with --onlyFailures). Similarly, npm run typecheck must be defined. Verify these scripts exist before configuring. If test:affected is not defined, substitute a safe default such as npm test.

Ordering matters. Type checking before test execution catches compilation errors early, avoiding wasted test runner cycles. For monorepo setups with multiple verification targets, commands can reference workspace-specific scripts or use tooling like Turborepo or Nx to scope verification to affected packages. For example, in an Nx monorepo: "npx nx affected --target=test".

Auto-Verification in Headless Mode

For batch operations and CI integration, headless mode combines xhigh effort with auto-verification and structured output:

timeout 600 claude \
  --headless \
  --effort xhigh \
  --auto-verify \
  --output-format json \
  "Refactor the auth module to use dependency injection" \
  > verification-report.json
EXIT_CODE=$?

if [[ $EXIT_CODE -eq 124 ]]; then
  echo "ERROR: claude invocation timed out after 600s" >&2
  exit 1
elif [[ $EXIT_CODE -ne 0 ]]; then
  echo "ERROR: claude exited with code ${EXIT_CODE}" >&2
  exit "${EXIT_CODE}"
fi

if [[ ! -s verification-report.json ]]; then
  echo "ERROR: verification-report.json is empty or missing" >&2
  exit 1
fi

Warning: Without explicit error handling after the claude invocation, a failed run will produce an empty or partial JSON file while the shell may report a successful exit code. Always check the exit code and validate that the output file is non-empty.

Safety note: Before running headless refactoring on a production codebase, ensure you are operating on a clean git branch so that all changes can be reviewed and reverted if necessary. Consider running on a small subset of files first to validate behavior.

The JSON report captures each verification step's pass/fail status, retry count, and the final generated output. This structured data integrates directly into CI pipelines.

A GitHub Actions workflow running Claude Code in headless auto-verify mode on pull request events:

name: Claude Code Auto-Verify
on:
  pull_request:
    types: [opened, synchronize]

jobs:
  claude-verify:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - uses: actions/checkout@v4

      - name: Set up Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '20'

      - name: Install project dependencies
        run: npm ci

      - name: Install Claude Code
        run: |
          set -euo pipefail
          npm install -g @anthropic/claude-code@2.1.0
        # NOTE: Verify the exact npm package name via Anthropic's official
        # install documentation or `npm view @anthropic/claude-code`.
        # If the package name is incorrect, this step will fail with a 404.

      - name: Run Claude Code Verification
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          CLAUDE_CODE_MODEL: ${{ secrets.CLAUDE_CODE_MODEL }}
          CLAUDE_CODE_EFFORT: xhigh
          CLAUDE_CODE_AUTO_VERIFY: "1"
        run: |
          set -euo pipefail
          timeout 600 claude \
            --headless \
            --auto-verify \
            --output-format json \
            "Review and verify all changes in this PR" \
            > report.json
          EXIT_CODE=$?
          if [[ $EXIT_CODE -eq 124 ]]; then
            echo "ERROR: claude timed out" >&2; exit 1
          fi
          if [[ ! -s report.json ]]; then
            echo "ERROR: report.json is empty" >&2; exit 1
          fi

      - name: Upload Verification Report (success)
        if: success()
        uses: actions/upload-artifact@v4
        with:
          name: claude-verification-report-success
          path: report.json

      - name: Upload Verification Report (failure)
        if: failure()
        uses: actions/upload-artifact@v4
        with:
          name: claude-verification-report-failure
          path: report.json

Note: The ANTHROPIC_API_KEY and CLAUDE_CODE_MODEL are injected via GitHub Actions secrets, which masks them in logs. Never hard-code API keys or model identifiers in workflow files.

Cost Optimization Strategies for xHigh

Understanding the Token Economics

The xhigh tier carries a higher token cost multiplier compared to high due to extended chain-of-thought reasoning that consumes more input and output tokens. Anthropic has not published an exact multiplier; monitor your first few xhigh sessions via --usage to establish your own baseline. Token costs vary by model and usage; consult Anthropic's pricing page for current per-token rates.

Auto-verification retries compound this cost. Each retry cycle regenerates code and re-executes verification, so a task that takes multiple retries before passing will cost proportionally more than a single-pass completion. The --usage summary output at the end of each session (verify this flag exists via claude --help) provides a breakdown of tokens consumed and estimated cost, which should be monitored regularly during initial adoption.

Selective Escalation Patterns

The most cost-effective approach uses high as the default effort tier and reserves xhigh for specific task types where deeper reasoning demonstrably improves outcomes: complex refactors, architectural migrations, multi-file dependency resolution, and tasks where inter-module reasoning is critical.

Prompt structure also influences effective reasoning depth. Well-structured prompts that clearly define scope, constraints, and expected outputs help Claude Code pick the right reasoning depth even at the high tier. Set verification.maxRetries to a cap of 2 to 3 to prevent runaway loops from consuming budget on fundamentally misguided approaches.

Budget Guards and Spend Alerts

You set cost guardrails directly in Claude Code 2.1's configuration:

{
  "model": "YOUR_MODEL_ID_HERE",
  "effort": "xhigh",
  "autoVerify": true,
  "maxCostPerSession": 25.00,
  "maxCostPerTask": 8.00,
  "verification": {
    "maxRetries": 3
  }
}

Warning: The maxCostPerSession and maxCostPerTask keys are unverified. Confirm these keys are implemented in your installed version before relying on them for cost protection. To test, set a threshold below the cost of a known task and observe whether execution halts. Use Anthropic dashboard organization-level limits as your primary cost guardrail regardless.

API-level budget controls are also available through the Anthropic dashboard for organization-wide limits. Teams should set per-developer or per-project caps at the API level.

Production Patterns and Real-World Workflows

Pattern 1: Large-Scale Refactoring with xHigh + Auto-Verify

Consider migrating a 200-file Express.js codebase from CommonJS to ESM. This task involves rewriting require and module.exports statements, updating file extensions, resolving inter-file import paths, handling circular dependencies, and ensuring all existing tests continue to pass.

The prompt structure for this type of multi-file refactoring task must be explicit about scope and constraints:

Migrate the entire src/ directory from CommonJS to ESM modules.

Requirements:
- Convert all require() calls to import statements
- Convert all module.exports to named or default exports
- Update relative import paths to include .js extensions
- Resolve circular dependencies by restructuring where necessary
- Ensure all files in test/ pass after migration
- Do not modify any external dependency imports
- Process files in dependency order, starting from leaf modules

Verification criteria: npm run typecheck && npm run test must both pass.

xHigh's multi-pass planning handles inter-file dependency resolution that lower tiers miss. The auto-verification loop catches broken imports, circular dependency regressions, and test failures, retrying with corrected approaches before presenting the final result.

Pattern 2: Test Generation and Validation Loop

Generating integration tests for an untested API layer benefits from xhigh reasoning. The extended chain-of-thought allows the model to reason about edge cases, failure modes, authentication boundaries, and error response shapes. For instance, high often generates happy-path tests but omits auth-boundary edge cases that xhigh's multi-pass planning catches.

Generate integration tests for all endpoints in src/routes/api/v2/.

Requirements:
- Cover happy path, authentication failure, validation errors, and 404 cases
- Use the existing test setup in test/helpers/setup.ts
- Mock external service calls using the patterns in test/mocks/
- Each test file should be independently runnable
- Target >90% branch coverage for each route handler

Verification: npm run test:integration must pass with all new tests included.

Auto-verification ensures that generated tests actually compile and pass before presenting results. Without it, test generation at any effort tier frequently produces tests that reference nonexistent fixtures or use incorrect assertion patterns.

Pattern 3: Database Migration Script Generation

Generating Prisma migration scripts with rollback verification is another strong xhigh use case. Custom verification commands can include prisma validate or prisma migrate diff to validate migration logic without touching the database. xHigh's advantage here is reasoning about data integrity constraints, foreign key relationships, and the ordering of migration steps to avoid constraint violations during execution.

Anti-Patterns to Avoid

Running xhigh on trivial tasks wastes budget without improving output quality. Boilerplate generation, simple CRUD endpoints, and configuration file edits all produce identical results at medium. Disabling auto-verification to "save time" on complex tasks defeats the purpose of the self-correcting loop and leads to outputs that compile but fail in production. Overly broad prompts ("refactor this whole project to be better") cause xhigh to over-reason and hallucinate scope, generating changes to files that should not be touched.

Disabling auto-verification to "save time" on complex tasks defeats the purpose of the self-correcting loop and leads to outputs that compile but fail in production.

Implementation Checklist and Quick Reference

Prerequisites

  • Ensure Node.js is installed (check minimum version requirements in Anthropic's install docs).
  • All shell examples assume a POSIX-compatible shell (Linux/macOS). Windows users must adapt export to $env:VAR = "value" (PowerShell) or set VAR=value (cmd).
  • Package name: Verify the correct npm package name via Anthropic's official installation documentation before running npm install.
  • Confirm your Anthropic plan includes access to the target model by calling GET /v1/models with your API key and checking that the model identifier appears in the response.
  • Project tooling: Ensure npm run typecheck, npm run test:affected, and any other scripts referenced in verification.commands are defined in your package.json.

Pre-Flight Checklist

  • Confirm Claude Code version: claude --version should report 2.1.0 or later
  • Verify model access by calling GET /v1/models with your API key and confirming the target model identifier appears in the response
  • Create project-level .claude/settings.json in the repository root with verification commands matching the project's actual toolchain
  • Configure cost guardrails before starting xhigh workflows (and verify they function -- see warning above)

Configuration Quick Reference Table

Note: All configuration keys, defaults, and CLI flags below are unverified. Confirm against claude --help and official documentation.

Setting Location Default Recommended for xHigh
effort settings.json / CLI / env medium (unverified) xhigh
model settings.json / CLI / env auto-routed (unverified) Pin to a specific model ID
autoVerify settings.json / CLI false (unverified) true
verification.commands settings.json auto-detected (unverified) explicit array
verification.maxRetries settings.json unknown -- verify against installed version 3
maxCostPerSession settings.json none (unverified) 25.00 (verify key is functional)
maxCostPerTask settings.json none (unverified) 8.00 (verify key is functional)
headless CLI flag false (unverified) true (CI only)

Decision Matrix: When to Use Each Effort Tier

Task Type Recommended Effort Auto-Verify? Notes
Simple edits / typos low No Minimal reasoning needed
Feature implementation medium or high Optional Standard development tasks
Multi-file refactoring xhigh Yes Benefits from extended chain-of-thought
Codebase migration xhigh Yes Inter-file dependency resolution critical
Architecture planning xhigh No Reasoning-heavy but output is prose/design, not executable code -- auto-verify has no meaningful verification target

Note: Cost multipliers vary by model and usage. Consult Anthropic's pricing page for current per-token rates rather than relying on fixed multiplier estimates.

Complete Reference Configuration

The following .claude/settings.json incorporates all settings discussed throughout this article and serves as a copy-paste starting point:

{
  "model": "YOUR_MODEL_ID_HERE",
  "effort": "xhigh",
  "autoVerify": true,
  "maxCostPerSession": 25.00,
  "maxCostPerTask": 8.00,
  "verification": {
    "commands": [
      "npm run typecheck",
      "npm run test:affected",
      "eslint . --cache --cache-location .eslintcache"
    ],
    "maxRetries": 3
  }
}

Before using this configuration:
1. Replace YOUR_MODEL_ID_HERE with the correct model identifier from Anthropic's models API.
2. Verify that maxCostPerSession and maxCostPerTask are functional in your installed version.
3. Confirm all verification.commands scripts exist in your package.json.
4. Do not use eslint --fix in the verification commands array -- use eslint . --cache --cache-location .eslintcache for read-only checking.

Substitute the appropriate commands for projects using different test runners, linters, or type checkers. Calibrate cost thresholds based on observed usage during the first week of xhigh adoption.

SitePoint TeamSitePoint Team

Sharing our passion for building incredible internet things.

© 2000 – 2026 SitePoint Pty. Ltd.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.