Hidden money-saving tactics in Claude Code’s source code—cut your API bill in half

Difficulty: Medium | Duration: 25 minutes | Gain: Learn 7 source-level cost-saving tactics to reduce your API bill by 30–60%

Target audience profile

Full-stack engineers who are using—or are planning to deeply use—Claude Code, especially solo developers and small teams who care about controlling API costs. If your understanding of Claude Code is still at the “install and use” stage, this article will show you how many adjustable parameters are hidden under the hood.

Core dependencies and environment

Node.js 18+
Claude Code 2.1.88
TypeScript basics (enough to read type definitions)
npm or pnpm
A valid ANTHROPIC_API_KEY

TIP

All source-code examples in this article are based on version 2.1.88. Check the version with claude --version. Details may vary slightly between versions, but the core mechanisms are the same.

Complete project folder tree

# Claude Code source structure (relevant directories)
src/
├── cost-tracker.ts              # Core cost tracking
├── costHook.ts                  # Cost output hook
├── Tool.ts                      # Tool abstraction factory
├── tools.ts                     # Tool registration and filtering
├── tools/
│   ├── BashTool/                # Shell execution tool
│   │   └── BashTool.tsx
│   │   └── readOnlyValidation.ts # Read-only detection logic
│   ├── FileReadTool/           # File reading
│   ├── FileEditTool/           # File editing
│   └── ...
├── services/
│   ├── compact/                 # Context compression service
│   │   └── compact.ts
│   └── tools/
│       └── toolOrchestration.ts # Tool concurrency partitioning
├── skills/
│   └── loadSkillsDir.ts         # Skill loading
├── types/
│   └── permissions.ts           # Permission mode definitions
├── upstreamproxy/               # Upstream proxy (enterprise)
│   ├── upstreamproxy.ts
│   └── relay.ts
└── utils/
    └── git.ts                   # Git / Worktree utilities

One. How Claude Code bills get more expensive

Before we talk about saving money, let’s first understand where the money goes. Claude Code’s API bill is mainly driven by three things:

1. Total context tokens

In each conversation, Claude Code bundles historical messages, tool call results, and file contents to send to the API. The larger the context, the higher the price per request. In cost-tracker.ts, consumption for each session is tracked precisely:

// src/cost-tracker.ts
interface ModelUsage {
  inputTokens: number;
  outputTokens: number;
  cacheReadInputTokens: number;    // Cache hits (cheaper)
  cacheCreationInputTokens: number; // Cache creation (one-time cost)
  webSearchRequests: number;
  costUSD: number;
}

There are two key points here: cacheReadInputTokens is about 10x cheaper than normal input tokens, while cacheCreationInputTokens is a one-time overhead for creating a cache—this directly determines the compression strategy later in this article.

2. Tool call frequency

On average, Claude Code triggers dozens of tool calls per conversation. Every tool call generates an API request (streaming tool use). More tools and more frequent calls cause the context to grow faster.

3. Agent concurrency level

When you start a sub-agent with AgentTool, or when a Coordinator mode schedules multiple Workers at the same time, each run is an independent session—costs double directly.

Claude Code includes cost-tracker.ts to track these costs in real time. When exiting, it persists the data into the project configuration:

// src/costHook.ts
// Automatically saves and prints a billing summary when the process exits
process.on('exit', () => {
  if (hasConsoleBillingAccess()) {
    console.log(formatTotalCost()); // Print bill with per-model breakdown
  }
  saveCurrentSessionCosts(getFpsMetrics?.());
});

WARNING

Billing tracking is isolated at the project level—session consumption from different directories won’t be merged. If you use Claude Code across multiple projects, each project keeps its own separate cost records.

Two. Money-Saving Tactic 1: Compact context compression to save 30–50%

Claude Code includes a built-in compact service that “compresses” past conversations into summaries, keeping the context within a reasonable size. The key parameters in the source code are clear at a glance:

// src/services/compact/compact.ts
const POST_COMPACT_TOKEN_BUDGET = 50_000;           // Max total tokens after compression
const POST_COMPACT_MAX_TOKENS_PER_FILE = 5_000;      // Keep at most 5K tokens per file
const POST_COMPACT_MAX_TOKENS_PER_SKILL = 5_000;      // Keep at most 5K tokens per single Skill
const POST_COMPACT_SKILLS_TOKEN_BUDGET = 25_000;      // Total Skill tokens budget: 25K
const POST_COMPACT_MAX_FILES_TO_RESTORE = 5;          // Restore at most 5 files after compression

The compression flow has three steps: trigger → API summary → re-attach key files. It first triggers pre_compact hooks, calls the compression API (this is a separate API request), then triggers post_compact hooks, and finally re-attaches recently read files.

Hands-on tip: By adjusting the trigger threshold, you can control how often compression happens, and therefore control your costs.

// settings.json — under .claude/ in the project root
{
  "compact": {
    "autoCompactThreshold": 0.85  // Trigger compression when context reaches 85% (default)
  }
}

Lowering the threshold (e.g., 0.7) will compress more frequently. Each compression costs less, but the summary quality may drop slightly. Raising it (0.95) does the opposite.

TIP

If you notice Claude Code “forgets” critical context late in long conversations, it’s likely that important context was cut off by compression. In that case, add key information back manually in the conversation, or lower autoCompactThreshold so compression happens more often but with a smaller amount each time.

Three. Money-Saving Tactic 2: Use read-only tools wisely to avoid pointless consumption

Each tool in Claude Code has an isReadOnly() method—it determines whether the tool has side effects. The tool scheduler uses this flag to decide whether concurrency execution is allowed:

// src/Tool.ts — Tools default to conservative behavior (fail-closed)
const TOOL_DEFAULTS = {
  isConcurrencySafe: (_input) => false,  // Default: not safe for concurrency
  isReadOnly: (_input) => false,          // Default: has side effects
  isDestructive: (_input) => false,        // Default: non-destructive
};

The key part is BashTool. It includes detailed command classification logic:

// src/tools/BashTool/BashTool.tsx
// Read-only command allowlist — these commands can run concurrently safely without modifying any state
const BASH_SEARCH_COMMANDS = ['find', 'grep', 'rg', 'ag', 'ack', 'locate', 'which', 'whereis'];
const BASH_READ_COMMANDS   = ['cat', 'head', 'tail', 'less', 'more', 'wc', 'stat', 'file', 'strings', 'jq', 'awk', 'cut', 'sort', 'uniq', 'tr'];
const BASH_LIST_COMMANDS   = ['ls', 'tree', 'du'];
const BASH_SEMANTIC_NEUTRAL_COMMANDS = ['echo', 'printf', 'true', 'false', ':'];

Hands-on tip: When you run a read-only query like ls -la, Claude Code recognizes it as read-only and batches it with other read-only operations for concurrent execution, reducing waiting time. But if you execute something destructive like rm -rf node_modules, it will be recognized as a destructive operation and must be handled serially—every time you’ll be asked to confirm.

Once you understand this logic, you can proactively organize your commands to trigger better concurrency optimization:

# Good practice: combine read-only operations in one command to reduce tool calls
find . -name "*.ts" | head -20 && wc -l src/**/*.ts

# Bad practice: mix destructive operations with read-only commands
find . -name "*.log" && rm -rf logs/  # This will be intercepted and run serially

Four. Money-Saving Tactic 3: Skills system reduces repeated prompts and saves Prompt Tokens

Claude Code supports a Skills system. Fundamentally, it’s a set of reusable prompt fragments. When you create a skills/ folder in your project root, put Markdown files inside it, and Claude Code will automatically load them:

your-project/
└── skills/
    └── my-workflow/
        └── SKILL.md   # Directory format: skill-name/SKILL.md

Skill files include frontmatter metadata:

---
name: my-skill
description: Standard flow for refactoring React components
whenToUse: Use when you need to refactor components or split large ones
paths: ["src/**/*.tsx", "src/**/*.ts"]  # Activate only when these paths match
allowedTools: [Read, Edit, Bash]       # Limit available tools
arguments: [filePath, scope]
executionContext: inline               # inline or fork
---
# Skill content — write your standard refactoring workflow prompt here

Why does it save money? Every time you start a new conversation, Claude Code has to insert all tool descriptions, rules, and context into the system prompt. Skills standardize common workflow conventions, reducing the Token cost of you repeatedly typing the same instructions. You save the prompt length that you would otherwise copy-paste.

The paths field is the core of conditional activation—this Skill appears in the candidate list only when it matches the specified paths. This prevents irrelevant Skills from polluting the context:

// src/skills/loadSkillsDir.ts
// Conditional activation using gitignore-style matching
activateConditionalSkillsForPaths(filePaths, cwd) {
  // After accessing src/**/*.tsx, React-related Skills are activated
}

TIP

High-reuse Skills should live in ~/.claude/skills/ (user-level). Project-level Skills should be reserved for project-specific workflows. The higher the level, the wider the impact—and the more obvious the savings.

Five. Money-Saving Tactic 4: Concurrency partitioning reduces retries and saves Rate Limit usage

Claude Code doesn’t execute all tool calls serially. It uses an intelligent partitioner: tools that can run at the same time are grouped together, while tools that must queue are handled separately.

// src/services/tools/toolOrchestration.ts
// Read concurrency limit, default is 10
function getMaxToolUseConcurrency() {
  return parseInt(process.env.CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY ?? '10', 10);
}

// Tool partitioning: if consecutive blocks are read-only/concurrency-safe, merge them; otherwise start a new batch
function partitionToolCalls(toolUseMessages, toolUseContext): Batch[] {
  return toolUseMessages.reduce((acc, toolUse) => {
    const isConcurrencySafe = tool.isConcurrencySafe(parsedInput.data);
    if (isConcurrencySafe && acc[acc.length-1]?.isConcurrencySafe) {
      acc[acc.length-1].blocks.push(toolUse); // Concurrency-safe: merge
    } else {
      acc.push({ isConcurrencySafe, blocks: [toolUse] }); // New batch
    }
    return acc;
  }, []);
}

Hands-on tip: If your MCP tools support concurrency, you must explicitly declare isConcurrencySafe: true when defining them. Tools without that declaration default to serial execution. The Token waste caused by retries when hitting Rate Limit is implicit, but it adds up significantly over time.

// settings.json — Declare concurrency safety for your MCP tools
{
  "mcpServers": {
    "my-mcp": {
      "command": "npx",
      "args": ["-y", "my-mcp-tool"],
      "isConcurrencySafe": true  // After declaring, it can run concurrently with other read-only tools
    }
  }
}

You can also raise the concurrency ceiling via an environment variable:

# Good for strong machine performance and sufficient API quota
export CLAUDE_CODE_MAX_TOOL_USE_CONCURRENCY=20

Six. Money-Saving Tactic 5: Use Worktree isolation instead of full clones to slim down collaboration context

Claude Code includes Worktree support. Git Worktree lets you check out multiple branches into different directories within the same repository, without cloning multiple copies. Claude Code’s findCanonicalGitRoot automatically parses Worktree paths:

// src/utils/git.ts
// Safely resolve the canonical Worktree root
function resolveCanonicalRoot(gitRoot: string): string {
  const gitDirContent = readFileSync(path.join(gitRoot, '.git'), 'utf8');
  // .git file format: gitdir: /path/to/.git/worktrees/<name>
  const commonDir = extractCommonDir(gitDirContent);
  // Safe validation: worktreeGitDir must be a subdirectory of commonDir/worktrees/
  // Prevent malicious repos from using .git to escape sandbox
  validateWorktreeSecurity(worktreeGitDir, commonDir);
  return mainRepoRoot;
}

Cost-saving principle: If your team’s workflow is to clone a fresh repository for each feature branch, Claude Code ends up including each clone’s .git directory (which can be several GB) in context scanning. In Worktree mode, there’s only one .git directory, so the scanned context volume shrinks drastically.

# Create a Worktree (instead of cloning a new copy)
git worktree add ../feature-sidebar feature/sidebar

# Claude Code will auto-detect the Worktree and share the main repo context
# Switch to another worktree by cd-ing into it and starting claude
cd ../feature-sidebar && claude

WARNING

Worktree security is important: Claude Code verifies that the path in the .git file can’t be used for sandbox escape. However, if you use Worktree with untrusted repositories, you should still be careful and double-check.

Seven. Money-Saving Tactic 6: Permission mode tuning to reduce confirmation interactions

Claude Code has a complete permission system. Five modes decide when tools need your confirmation:

// src/types/permissions.ts
type PermissionMode =
  | 'acceptEdits'   // Automatically accept edits
  | 'bypassPermissions' // Completely bypass permission checks (dangerous but fastest)
  | 'default'       // Ask according to rules
  | 'dontAsk'       // Never ask
  | 'plan'          // Plan mode
  | 'auto';         // AI automatically classifies and decides (when TRANSCRIPT_CLASSIFIER is enabled)

Mode	Behavior	Speed	Safety
`default`	Ask based on rules	Slow	Safe
`auto`	AI classifier decides automatically	Medium	More safe
`dontAsk`	Don’t ask, execute directly	Fast	Risk is on you
`bypassPermissions`	Skip checks entirely	Fastest	High risk
`plan`	In Plan mode, don’t execute	Slowest	Safest

Hands-on tip: For directories you fully trust (your own projects—code already tracked in git), you can fine-tune in settings.json:

// ~/.claude/settings.json (user-level) or .claude/settings.json (project-level)
{
  "permissions": {
    "defaultMode": "auto",
    "allow": [
      { "toolName": "Bash", "ruleContent": "pnpm*" },
      { "toolName": "Read" },
      { "toolName": "Glob" },
      { "toolName": "Grep" }
    ],
    "deny": [
      { "toolName": "Bash", "ruleContent": "rm -rf /" }
    ],
    "ask": [
      { "toolName": "Bash", "ruleContent": "git push" }
    ]
  }
}

The auto mode is a good trade-off between convenience and safety. It uses the AI classifier to judge whether an operation is safe, avoiding unnecessary pop-up confirmations. Every time you reduce one confirmation interaction, you reduce one round of context-switching Token consumption.

WARNING

The bypassPermissions mode is like turning Claude Code into a root-permission script—any input will be executed directly. Only use it in isolated test directories, and absolutely never enable it on your main project.

Eight. Money-Saving Tactic 7: Enterprise upstream proxy caching to reduce repeated API calls

Claude Code’s upstreamproxy module provides proxy routing for Claude Code Connect (CCR, enterprise). Pay attention to the NO_PROXY_LIST value in the source code:

// src/upstreamproxy/upstreamproxy.ts
const NO_PROXY_LIST = [
  'localhost', '127.0.0.1', '::1',
  '169.254.0.0/16', '10.0.0.0/8', '172.16.0.0/12', '192.168.0.0/16',
  'anthropic.com', 'github.com', '*.github.com',
  // npm / PyPI / Rust registry are all in the whitelist; no proxy needed
  'registry.npmjs.org', 'pypi.org', 'files.pythonhosted.org',
  'index.crates.io', 'proxy.golang.org'
];

Hands-on tip: Even if you’re not on the enterprise edition, understanding this logic can help you optimize local configuration. If you’re on a company network, Claude Code will automatically route traffic to the enterprise proxy. At the proxy layer, you can do Token caching and reuse—this is the underlying cost-saving mechanism for CCR enterprise users.

For self-hosted MCP services, you can enable response caching in the MCP config:

// settings.json
{
  "mcpServers": {
    "local-knowledge": {
      "command": "node",
      "args": ["server.js"],
      "env": {
        "MCP_CACHE_TTL": "3600"  // Cache for 1 hour, reducing repeated API calls
      }
    }
  }
}

Nine. Common issue troubleshooting

Q1: I only changed 1 line, but my bill is still high?

A high Claude Code bill doesn’t always come from edits. It may come from “context expansion.” In long conversations, even if you only change 1 line, the pre-compression historical messages and tool results are still kept in the context. How to check: look at the formatTotalCost() output when the session ends and compare inputTokens vs outputTokens. The larger the inputTokens, the more bloated the context is.

Q2: Answer quality drops noticeably after Compact compression?

This is usually a compression granularity problem. Change autoCompactThreshold from the default 0.85 to 0.95. This reduces compression frequency, while ensuring that key design decisions and context information are clearly recorded early in the conversation—so the compressed summaries maintain higher quality.

Q3: Is bypass mode safe or not?

It’s unsafe, but controllable in isolated environments. If you must use bypass, it’s recommended to add a safety net with permissions.deny rules:

{
  "permissions": {
    "defaultMode": "bypassPermissions",
    "deny": [
      { "toolName": "Bash", "ruleContent": "sudo*" },
      { "toolName": "Bash", "ruleContent": "curl*|wget*" }
    ]
  }
}

Q4: How do I monitor daily Token consumption?

Claude Code automatically prints the bill when each session exits (if you have consoleBillingAccess). A more systematic approach is to set up a tracking script at the project level:

# Add a post-checkout hook inside your project’s .git/hooks/
# But a more practical method is to directly check .claude/cost/<session-id>.json

Q5: Do Skills consume extra Tokens every time?

Yes, but it’s controllable. Skill-content Token usage is included in POST_COMPACT_SKILLS_TOKEN_BUDGET (25K limit). The more and the more detailed your Skills are, the larger your context becomes. So the writing principle for Skills is: precise matching, only the minimum necessary information. Use the paths field so that a Skill is loaded only when related files are accessed.

Q6: Does Claude Code behave oddly after using Worktree?

Worktree includes security validation. If the .git file isn’t in the expected format, Claude Code will fall back to the current directory. Check that the .git file content matches the format gitdir: /absolute/path/to/.git/worktrees/<name>.

Ten. Further reading / advanced directions

1. Write your own Skills to reduce repeated engineering work

If you notice you repeatedly write prompts like “write code in this style” across multiple projects, extract it into a Skill, put it under ~/.claude/skills/, and make it apply globally. Write once, save repeatedly.

2. Use MCP to build a local toolchain instead of relying on cloud calls

Token-heavy tasks are often localizable: code search, documentation queries, and database operations. Wrap these capabilities into local MCP tools to reduce dependency on the Claude API while also improving response speed.

3. Monitor compact logs and continuously optimize your context strategy

Enable verbose mode to observe when compact triggers and what compression achieves. Keep tuning autoCompactThreshold until you find a balance point suitable for the scale of your project.

claude --verbose 2>&1 | grep -i compact

4. Understand the Permission classifier from the source level

Behind Claude Code’s auto mode is an AI classifier (Transcript Classifier). It assigns a risk level based on the operation’s content. If you want to go deeper, study the YoloClassifierResult type definition in src/types/permissions.ts.