Reducing Token Usage
12 Jun 2026 - Warren Mar
I am token poor. In order to get things accomplished with my Pro subscription, I need to reduce token usage as much as possible.
Costs
| Model | Input (per MTok) | Output (per MTok) |
|---|---|---|
| Claude Opus 4.8 | $5 | $25 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Haiku 4.5 | $1 | $5 |
Tokens are influenced by input, output, and cache. Haiku 4.5 is 3x cheaper than Sonnet 4.6 and 5x cheaper than Opus 4.8 on both input and output tokens.
The session influences what is cached. The model is autoregressive. Continuing a conversation will append tokens, hitting the cache since the previous tokens have already been computed. Beware of cache expiration, since those tokens will have to be recomputed.
What’s in context
/context will show you what is filling up the context window.
- System prompt — Claude Code’s built-in instructions. Fixed cost, you can’t change it.
- System tools — schemas for built-in tools (Read, Edit, Bash, Grep, etc.). Grows with how many tools are enabled.
- MCP tools — schemas for any connected MCP servers. Can add up fast with many servers.
- Memory files — CLAUDE.md, skills, and other instruction files loaded at session start.
- Custom agents — subagent definitions available via the Agent tool.
- Messages — your prompts, Claude’s responses, and tool results (file contents, command output, etc.). Usually the largest and fastest-growing chunk.
Use a smaller model
Smaller models mean lower costs. Add model: haiku to frontmatter of simple skills.
Keep context clean
Keep CLAUDE.md and skills small since they are loaded and kept in the context. Disable unused MCP servers.
Avoid compaction
Compacting takes tokens. Do you really need to continue the session, or can you start fresh instead of spending tokens on compaction?
Use /clear between unrelated tasks.
Build MCP / CLI / scripts to reduce code generation
If the model is writing the same code repeatedly, it is an opportunity to offload it to something else.
Limit output of tools
Use grep, rg, jq, or other command-line tools to filter the output of tools.
Subagents for exploration
This reduces putting throwaway output into your main context.
Batch tasks in one prompt
Avoid having to use extra tokens to repeat things.
Run an audit
The /token-audit skill will give you ideas on how to reduce token usage.
Spec / Plan
When running long-running tasks, write a spec or work on a plan, so you don’t waste tokens generating corrections since you underspecified what you wanted.