Reducing Token Usage

12 Jun 2026 - Warren Mar

I am token poor. In order to get things accomplished with my Pro subscription, I need to reduce token usage as much as possible.

Costs

Per Anthropic’s pricing page:

Model	Input (per MTok)	Output (per MTok)
Claude Opus 4.8	$5	$25
Claude Sonnet 4.6	$3	$15
Claude Haiku 4.5	$1	$5

Tokens are influenced by input, output, and cache. Haiku 4.5 is 3x cheaper than Sonnet 4.6 and 5x cheaper than Opus 4.8 on both input and output tokens.

The session influences what is cached. The model is autoregressive. Continuing a conversation will append tokens, hitting the cache since the previous tokens have already been computed. Beware of cache expiration, since those tokens will have to be recomputed.

What’s in context

/context will show you what is filling up the context window.

System prompt — Claude Code’s built-in instructions. Fixed cost, you can’t change it.
System tools — schemas for built-in tools (Read, Edit, Bash, Grep, etc.). Grows with how many tools are enabled.
MCP tools — schemas for any connected MCP servers. Can add up fast with many servers.
Memory files — CLAUDE.md, skills, and other instruction files loaded at session start.
Custom agents — subagent definitions available via the Agent tool.
Messages — your prompts, Claude’s responses, and tool results (file contents, command output, etc.). Usually the largest and fastest-growing chunk.

Use a smaller model

Smaller models mean lower costs. Add model: haiku to frontmatter of simple skills.

Keep context clean

Keep CLAUDE.md and skills small since they are loaded and kept in the context. Disable unused MCP servers.

Avoid compaction

Compacting takes tokens. Do you really need to continue the session, or can you start fresh instead of spending tokens on compaction?

Use /clear between unrelated tasks.

Build MCP / CLI / scripts to reduce code generation

If the model is writing the same code repeatedly, it is an opportunity to offload it to something else.

Limit output of tools

Use grep, rg, jq, or other command-line tools to filter the output of tools.

Subagents for exploration

This reduces putting throwaway output into your main context.

Batch tasks in one prompt

Avoid having to use extra tokens to repeat things.

Run an audit

The /token-audit skill will give you ideas on how to reduce token usage.

Spec / Plan

When running long-running tasks, write a spec or work on a plan, so you don’t waste tokens generating corrections since you underspecified what you wanted.

« Resume Prompt Injection