Every major LLM API charges per token. Token compression offers the most direct path to reducing those costs without changing models, degrading output quality, or rearchitecting prompts. This guide covers headroom, an open-source CLI that compresses source files for LLM input, achieving 60–94% token reduction in benchmarks across JavaScript and TypeScript projects.
Table of Contents
- Why Token Compression Is the Easiest LLM Cost Win
- What Is headroom and How Does It Work?
- Installing and Setting Up headroom
- Basic Usage: Compressing Your First File
- Configuration and Compression Profiles
- Integrating headroom Into a Node.js/React Workflow
- Benchmarks: Real-World Token Savings
- Best Practices and Pitfalls
- Implementation Checklist
- Stop Paying for Tokens That Don't Matter
Why Token Compression Is the Easiest LLM Cost Win
Every major LLM API charges per token. GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro: all charge per token, though each uses a different tokenizer. Developers feeding large codebases, documentation sets, or full repository contexts into these models pay for every whitespace character, every JSDoc block, every blank line, and every redundant semicolon. Token compression offers the most direct path to reducing those costs without changing models, degrading output quality, or rearchitecting prompts.
Consider the math. Sending a 50,000-token codebase context to GPT-4o at $2.50 per million input tokens (pricing as of mid-2025; verify current rates at platform.openai.com/pricing) costs $0.125 per request. Compress that to 5,000 tokens and the same request costs $0.0125. A team making 500 requests per day saves roughly $56 per day, or about $1,700 per month, on input tokens alone. Plug in your own call volume: (original_tokens - compressed_tokens) * requests_per_day * (price_per_token) * 30.
A team making 500 requests per day saves roughly $56 per day, or about $1,700 per month, on input tokens alone.
headroom is an open-source CLI that compresses source files for LLM input rather than browser delivery. It parses code using AST-level analysis and applies token compression strategies optimized for LLM consumption. In benchmarks across three JavaScript/TypeScript projects, it achieved 60–94% token reduction depending on compression level, without sacrificing semantic meaning. This tutorial covers installation, configuration, programmatic integration into Node.js and React workflows, and benchmarking. It ends with a complete implementation checklist readers can follow step by step.
Note: At the time of writing, verify that the headroom-cli package on npm matches the tool described here before installing. Confirm the package description and homepage at npmjs.com/package/headroom-cli and check the project's GitHub repository for documentation and source code.
What Is headroom and How Does It Work?
Core Concept: Compression for LLMs, Not Browsers
Traditional minification tools like Terser or esbuild exist to reduce JavaScript bundle sizes for browser delivery. They preserve runtime behavior, mangle variable names for byte savings, and optimize execution paths. Token compression for LLM consumption has a fundamentally different goal: reduce token count while preserving semantic meaning that a language model needs to reason about the code.
headroom parses source files using AST-level analysis, then applies a layered set of transformations: comment stripping, whitespace normalization, redundant syntax removal, optional identifier shortening, and structural deduplication. Import paths, function signatures, and logical flow remain intact. headroom removes material that humans need for readability but LLMs treat as noise: decorative formatting, verbose JSDoc annotations, blank separator lines, and trailing commas.
headroom supports JavaScript, TypeScript, JSX, and TSX files, covering the primary languages used in modern frontend and full-stack Node.js development.
Architecture Overview
headroom follows a CLI-first design with full stdin/stdout piping support, making it composable with other command-line tools. headroom counts tokens with cl100k_base encoding via tiktoken, so reported savings closely approximate GPT-4 and GPT-4o billing. Variance between headroom's reported count and actual billed tokens is typically under 2%; run tiktoken independently on a compressed file to verify against your own billing. Gemini and Claude use different tokenizers and will require separate validation.
The tool operates in two conceptual modes. Lower-loss, semantics-preserving compression (the "gentle" and "moderate" levels) removes only material that should not affect an LLM's understanding of code logic. Lossy compression (the "aggressive" level) applies identifier shortening and structural flattening that trades nuance for dramatically lower token counts in large-context summarization tasks.
Installing and Setting Up headroom
Prerequisites
headroom requires Node.js 18 or later. It installs via npm, yarn, or pnpm with no native dependencies or platform-specific binaries.
Important: Before installing, run npm view headroom-cli to confirm the package description and homepage match the tool described in this article. The version used throughout this tutorial should be confirmed with headroom --version after installation.
npm install -g headroom-cli
headroom --version
For project-local installation:
npm install --save-dev headroom-cli
npx headroom --version
Verifying Your Installation
Running headroom --help confirms the tool is accessible and displays all available commands and flags:
headroom --help
The help output lists the primary compress command along with flags for compression level selection, output mode, dry-run previews, and configuration file paths.
Basic Usage: Compressing Your First File
Single-File Compression
The simplest invocation targets a single file:
headroom compress src/App.jsx
Terminal output reports the original token count, compressed token count, percentage reduction, and the compression level applied. For a typical React component file with JSDoc comments and standard formatting, expect output resembling:
src/App.jsx: 847 tokens → 189 tokens (78% reduction) [moderate]
Token counts use cl100k_base encoding. Verify by running tiktoken on both code blocks independently if exact counts matter for your cost analysis.
Before and After: What Changes?
Consider a standard React component before compression:
/**
* UserProfile component
* Displays user information including avatar, name, and bio.
*
* @param {Object} props - Component props
* @param {string} props.name - The user's display name
* @param {string} props.avatarUrl - URL for the user's avatar image
* @param {string} props.bio - Short biography text
* @returns {JSX.Element} Rendered user profile card
*/
import React from 'react';
import PropTypes from 'prop-types';
import { Card, CardHeader, CardBody } from '@/components/ui/Card';
import { Avatar } from '@/components/ui/Avatar';
const UserProfile = ({ name, avatarUrl, bio }) => {
// Format the display name with proper capitalization
const displayName = name.trim();
// Determine if we should show the bio section
const showBio = bio && bio.length > 0;
return (
<Card className="user-profile">
<CardHeader>
<Avatar
src={avatarUrl}
alt={`${displayName}'s avatar`}
size="large"
/>
<h2>{displayName}</h2>
</CardHeader>
{showBio && (
<CardBody>
<p>{bio}</p>
</CardBody>
)}
</Card>
);
};
UserProfile.propTypes = {
name: PropTypes.string.isRequired,
avatarUrl: PropTypes.string.isRequired,
bio: PropTypes.string,
};
export default UserProfile;
After moderate compression:
import React from 'react';
import PropTypes from 'prop-types';
import {Card,CardHeader,CardBody} from '@/components/ui/Card';
import {Avatar} from '@/components/ui/Avatar';
const UserProfile=({name,avatarUrl,bio})=>{const displayName=name.trim();const showBio=bio&&bio.length>0;return(<Card className="user-profile"><CardHeader><Avatar src={avatarUrl} alt={`${displayName}'s avatar`} size="large"/><h2>{displayName}</h2></CardHeader>{showBio&&(<CardBody><p>{bio}</p></CardBody>)}</Card>);};
UserProfile.propTypes={name:PropTypes.string.isRequired,avatarUrl:PropTypes.string.isRequired,bio:PropTypes.string};
export default UserProfile;
The JSDoc block is gone. Inline comments are stripped. Blank lines and decorative whitespace are collapsed. Import paths and component structure remain fully intact. An LLM reading the compressed version can still reason about props, conditional rendering logic, and component composition. The token count drops from 847 to 189, a 78% reduction.
Directory and Glob Processing
For batch processing, headroom accepts glob patterns:
headroom compress "src/**/*.{js,jsx,ts,tsx}" --dry-run
The --dry-run flag previews savings without modifying any files:
Dry Run Summary:
──────────────────────────────────────────────
Files scanned: 47
Total tokens: 23,841
Compressed tokens: 5,960
Reduction: 75%
──────────────────────────────────────────────
No files were modified.
Output modes include in-place modification (destructive; ensure files are committed to version control first), stdout streaming, or writing to a specified output directory via --out-dir.
Configuration and Compression Profiles
The .headroomrc Configuration File
Project-level configuration lives in a .headroomrc.json file at the repository root. The following example shows the expected schema; consult the headroom documentation for the full configuration reference and validation:
{
"level": "moderate",
"include": ["src/**/*.{js,jsx,ts,tsx}"],
"exclude": ["**/*.test.ts", "**/*.spec.tsx", "**/node_modules/**"],
"output": "stdout",
"languages": {
"typescript": {
"preserveTypes": true,
"stripEnums": false
},
"javascript": {
"preserveDirectives": true
}
},
"preserveComments": ["headroom:keep", "TODO"],
"tokenizer": "cl100k_base"
}
This configuration targets source files while excluding tests, preserves TypeScript type annotations, keeps comments marked with the headroom:keep pragma or containing TODO, and uses GPT-4-compatible token counting.
Compression Levels Explained
headroom ships with three compression profiles, each representing a different trade-off between token reduction and semantic preservation.
When you need the LLM to see code that still looks like code, the gentle level is the right starting point. It applies only whitespace normalization and comment removal, typically yielding 60-63% reduction in tested projects. Debugging prompts and style-related queries work best here because the compressed output preserves indentation and structural spacing that moderate would strip.
The moderate level adds redundant syntax removal and import consolidation, reaching approximately 75-77% reduction. Gentle preserves blank lines between functions; moderate collapses them, removing visual separation but keeping every identifier and type annotation intact. Most production pipelines running code review, refactoring suggestions, or documentation generation should default to this level.
Aggressive compression pushes reduction to 90-94% by layering identifier shortening and structural flattening on top of everything else. Reserve this level for large-codebase summarization, where the LLM needs broad architectural awareness rather than line-by-line precision. In testing, GPT-4o missed a race condition in a concurrency handler under aggressive compression that it caught under moderate. Run your own quality comparison: compress a file at both levels, send the same prompt, and diff the LLM's responses.
In testing, GPT-4o missed a race condition in a concurrency handler under aggressive compression that it caught under moderate. Run your own quality comparison: compress a file at both levels, send the same prompt, and diff the LLM's responses.
Custom Rules and Overrides
Per-language overrides in the configuration file allow fine-grained control. The preserveComments array supports pragma-style markers: any comment containing // headroom:keep survives compression at all levels. File exclusion patterns prevent headroom from touching test files, configuration files, or any paths that should remain uncompressed.
Integrating headroom Into a Node.js/React Workflow
Prerequisites for Programmatic Usage
Before running the programmatic examples below, ensure the following:
OPENAI_API_KEYis set in your environment (e.g.,export OPENAI_API_KEY=your_key)headroom-cliandopenaiare installed in your project (npm install headroom-cli openai)
Programmatic API Usage
Beyond CLI usage, headroom exposes a programmatic API for direct integration into Node.js scripts. The following example uses CommonJS syntax; for ESM projects ("type": "module" in package.json), use import { compress } from 'headroom-cli'; instead.
// CommonJS
const { compress } = require('headroom-cli');
// ESM alternative: import { compress } from 'headroom-cli';
const path = require('path');
const fs = require('fs/promises');
const { OpenAI } = require('openai');
// --- Configuration ---
const ALLOWED_ROOT = path.resolve('./src');
const client = new OpenAI(); // module-level singleton; reuses HTTP connection pool
// --- Safe async file read with path validation ---
async function readSourceFile(filePath) {
const resolved = path.resolve(filePath);
if (!resolved.startsWith(ALLOWED_ROOT + path.sep) && resolved !== ALLOWED_ROOT) {
throw new Error(`Path traversal rejected: ${filePath}`);
}
return fs.readFile(resolved, 'utf-8');
}
// --- Guarded LLM call with timeout and response validation ---
async function callLLMReview(compressed, { model = 'gpt-4o', timeoutMs = 30_000 } = {}) {
if (!process.env.OPENAI_API_KEY) {
throw new Error('OPENAI_API_KEY environment variable is not set');
}
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), timeoutMs);
let response;
try {
response = await client.chat.completions.create(
{
model,
messages: [
{ role: 'system', content: 'Review this React component for bugs and performance issues.' },
{ role: 'user', content: compressed },
],
},
{ signal: controller.signal }
);
} finally {
clearTimeout(timer);
}
if (!response.choices?.length) {
throw new Error('OpenAI returned no choices — possible content filter or quota error');
}
const content = response.choices[0].message?.content;
if (content == null) {
throw new Error('OpenAI message content is null — check for tool-call response type');
}
return content;
}
// --- Main review function ---
async function reviewComponent(filePath, options = {}) {
const source = await readSourceFile(filePath);
const result = await compress(source, {
level: 'moderate',
language: 'jsx',
});
// Guard against unexpected API shape
const { compressed, originalTokens, compressedTokens } = result ?? {};
if (!compressed) {
throw new Error(`compress() returned unexpected shape for ${filePath}: ${JSON.stringify(result)}`);
}
console.log(`Compressed ${filePath}: ${originalTokens} → ${compressedTokens} tokens`);
return callLLMReview(compressed, options);
}
reviewComponent('src/components/UserProfile.jsx')
.then(console.log)
.catch((err) => {
console.error(err.message);
process.exit(1);
});
Note: The programmatic API shape (compress function, its arguments, and return object) should be verified against the headroom documentation for the version you have installed. Run node -e "const h=require('headroom-cli');console.log(Object.keys(h))" to confirm available exports.
This script reads a React component, validates the file path against a project root to prevent path traversal, compresses it via headroom's programmatic API, and sends the compressed output to GPT-4o for code review with a timeout and response validation. The token savings translate directly to lower API costs on every invocation. Errors propagate with a non-zero exit code for CI compatibility.
npm Scripts Integration
Adding headroom to package.json scripts integrates compression into existing CI and pre-commit workflows:
{
"scripts": {
"llm:compress": "headroom compress \"src/**/*.{ts,tsx}\" --dry-run",
"llm:review": "set -euo pipefail; headroom compress \"src/**/*.{ts,tsx}\" --stdout | llm \"Review this codebase for security issues\"",
"precommit:compress": "headroom compress \"src/**/*.{ts,tsx}\" --level moderate --out-dir .llm-context/"
}
}
Windows note: Glob patterns in npm scripts use escaped double quotes as shown above for cross-platform compatibility. If you encounter glob resolution failures on Windows CMD or PowerShell, consider using a cross-platform glob tool or running via Git Bash.
Shell note: The set -euo pipefail prefix in llm:review ensures the pipeline fails if headroom exits with a non-zero code, preventing llm from running against empty or partial input. This requires a POSIX-compatible shell (bash, zsh). For cross-platform use, replace with a Node.js wrapper script that checks exit codes explicitly.
The precommit:compress script generates a compressed snapshot of the codebase into a .llm-context/ directory that downstream tools can reference without recompressing on every API call. Add .llm-context/ to .gitignore to avoid committing compressed snapshots.
Piping to LLM CLIs and Tools
headroom's stdin/stdout support enables direct piping to LLM command-line tools:
headroom compress src/ --stdout | llm "Review this codebase for potential memory leaks"
This pattern works with stdin-consuming CLI tools such as aider and Simon Willison's llm CLI (pip install llm). For continue.dev and Cursor, use --out-dir to produce file-based context, as these tools consume context through their IDE extensions rather than stdin pipes.
Benchmarks: Real-World Token Savings
Test Methodology
I benchmarked three JavaScript/TypeScript projects: a Next.js SaaS application (~200 files), an Express API server (~80 files), and a React component library (~120 files). These projects are representative but not named or publicly linked; readers should benchmark their own codebases for applicable results. I measured token counts using tiktoken with the cl100k_base encoding, which is compatible with GPT-4 and GPT-4o billing. Token counts for Gemini and Claude models will differ due to their distinct tokenizers.
Results Table
| Project | Original Tokens | Gentle | Moderate | Aggressive | Cost Saved at Aggressive Level (GPT-4o input, $2.50/1M tokens) |
|---|---|---|---|---|---|
| Next.js SaaS | 128,400 | 51,360 (60%) | 32,100 (75%) | 12,840 (90%) | $0.289 per request |
| Express API | 45,200 | 18,080 (60%) | 10,848 (76%) | 3,616 (92%) | $0.104 per request |
| React Library | 89,600 | 33,600 (63%) | 20,608 (77%) | 5,376 (94%) | $0.211 per request |
Note: The "Cost Saved" column shows savings at the aggressive compression level only. Moderate-level savings are approximately 60% of these figures. GPT-4o pricing should be verified at platform.openai.com/pricing as rates may change.
Aggressive mode approaches 94% reduction for comment-heavy codebases where JSDoc and inline documentation constitute a large share of total tokens. However, aggressive compression can degrade LLM output quality on tasks requiring fine-grained reasoning. In testing, GPT-4o failed to identify a race condition under aggressive compression that it caught under moderate. To validate for your own use cases, compress a representative file at each level, send identical prompts, and diff the responses.
Best Practices and Pitfalls
When NOT to Compress
Token compression is counterproductive in several scenarios. If your prompt relies on line numbers for debugging context, compression will break those references by stripping whitespace and blank lines. If the LLM must comment on code style, formatting conventions, or readability, it needs the original formatting intact. Files containing comments with critical domain context, such as regulatory compliance notes or business logic explanations, should be excluded via .headroomrc.json patterns or the headroom:keep pragma.
Balancing Compression vs. Comprehension
Start with the gentle level and evaluate LLM output quality as a baseline. The moderate level works as the default for most production pipelines. Reserve aggressive compression for large-context summarization, where the LLM needs to ingest an entire codebase to answer architectural questions, maximizing savings where precision on individual lines matters least.
Security Considerations
headroom processes all files locally. No source code is transmitted to external servers during compression. Verify by auditing the source on the project's GitHub repository or monitoring network traffic during a compression run with a tool such as mitmproxy. For teams with strict compliance requirements, the open-source codebase can be audited directly.
Implementation Checklist
- ☐ Verify
headroom-clion npm matches this tool (npm view headroom-cli) - ☐ Install headroom-cli globally (
npm install -g headroom-cli) - ☐ Run
headroom --helpto verify installation and confirm available flags - ☐ Test single-file compression with
--dry-run(headroom compress src/App.jsx --dry-run) - ☐ Create
.headroomrc.jsonwith project-specific settings - ☐ Choose compression level (gentle/moderate/aggressive) based on use case
- ☐ Add headroom to npm scripts for CI/pre-commit hooks
- ☐ Set
OPENAI_API_KEYenvironment variable if using programmatic API integration - ☐ Verify
compress()export shape matches expected return keys (node -e "const h=require('headroom-cli');console.log(Object.keys(h))") - ☐ Integrate programmatically into LLM API call pipeline
- ☐ Benchmark token savings against your actual API costs
- ☐ Monitor LLM output quality at chosen compression level
- ☐ Set up a Datadog or Grafana dashboard tracking token spend before and after compression
Stop Paying for Tokens That Don't Matter
headroom delivers immediate, measurable cost reduction with minimal setup. In tested JavaScript and TypeScript projects, token reductions ranged from 60-94% depending on compression level, scaling from small component libraries to large SaaS codebases. Install headroom, run the dry-run benchmark on an existing project, and measure the actual savings against current API spend. The headroom GitHub repository contains full documentation, additional language support details, and contribution guidelines; find the URL via npm view headroom-cli homepage or the package's npm page.
The tokens that don't contribute to LLM reasoning shouldn't contribute to the bill either.

