AugmentClaude

Cost Breakdown

Analyze Claude API costs by model, session, and time period with detailed pricing metrics.

Installation

  1. Make sure Claude is on your device and in your terminal.

    Skills load from ~/.claude/skills/ when Claude Code starts up — so you need it on your machine first. If you don't have it yet, install it once with the command below, then run claude in any terminal to verify.

    One-time setup
    npm i -g @anthropic-ai/claude-code

    Already have it? Skip ahead.

  2. Paste into Claude Code or into your terminal.

    This copies the whole skill folder into ~/.claude/skills/cost-breakdown-hoangsonww/ — the SKILL.md plus any scripts, reference docs, or templates the skill ships with. Safe default: works for every skill.

    Faster alternative (instruction-only skills)

    Skips the clone and grabs only the SKILL.md file. Don't use this if the skill ships Python scripts, reference markdowns, or asset templates — they won't be downloaded and the skill will fail when it tries to load them.

    Quick install (SKILL.md only)
    Sign up to copy
  3. Restart Claude Code.

    Quit and reopen Claude Code (or any other agent that loads from ~/.claude/skills/). New skills are picked up on startup.

  4. Just ask Claude.

    Skills auto-activate when your request matches the skill's description — no slash command needed. Trigger phrases live in the skill's own frontmatter; you can read them in the “What this skill does” section above.

Prefer to read the source first? Open on GitHub.

When Claude uses it

Break down Claude Code costs using the Agent Monitor pricing engine. Shows per-model costs (input, output, cache_read, cache_write at $/Mtok rates), per-session costs, daily trends, and compaction baseline token recovery. Use when analyzing spending, comparing model costs, or planning budgets.

What this skill does

Cost Breakdown

Detailed cost analysis from the Agent Monitor's pricing engine.

Input

The user provides: $ARGUMENTS

This may be: "today", "this week", "last 30 days", a session ID, or "budget $50/week".

Data Sources

EndpointReturns
GET /api/pricing{ pricing: [{ model_pattern, display_name, input_per_mtok, output_per_mtok, cache_read_per_mtok, cache_write_per_mtok }] }
GET /api/pricing/costTotal cost: { total_cost, breakdown: [{ model, input_tokens, output_tokens, cache_read_tokens, cache_write_tokens, cost, matched_rule }] }
GET /api/pricing/cost/{sessionId}Per-session cost with same breakdown shape
GET /api/sessions?limit=200Sessions list — each includes inline cost field (bulk pricing)
GET /api/analyticsToken totals (total_input, total_output, total_cache_read, total_cache_write — baselines pre-summed), daily trends

How costs are calculated

The pricing engine matches model names against model_pattern using SQL LIKE (e.g. claude-sonnet-4-5% matches claude-sonnet-4-5-20250514). Longest pattern wins for specificity. Cost per model:

cost = (input_tokens / 1M) × input_per_mtok
     + (output_tokens / 1M) × output_per_mtok
     + (cache_read_tokens / 1M) × cache_read_per_mtok
     + (cache_write_tokens / 1M) × cache_write_per_mtok

Token counts are effective totals = current + baseline (baselines preserve pre-compaction tokens that would otherwise be lost when the transcript JSONL is rewritten).

Default pricing tiers (seeded on first run)

FamilyInput $/MtokOutput $/MtokCache Read $/MtokCache Write $/Mtok
Opus 4.5/4.6$5$25$0.50$6.25
Sonnet 4/4.5/4.6$3$15$0.30$3.75
Haiku 4.5$1$5$0.10$1.25

Report Sections

1. Cost by Model

Table from /api/pricing/cost breakdown — each model with 4 token counts + cost. Highlight which pricing rule matched.

2. Cost by Session (Top 10 Most Expensive)

From sessions list with inline cost — sort descending. Show session name, model, duration, cost.

3. Daily Cost Trend

Cross-reference daily_sessions with per-session costs to compute daily spend. Show 7/30-day trend with direction arrows.

4. Token Efficiency Analysis

  • Cache hit rate: total_cache_read / (total_cache_read + total_input) × 100 — higher = more efficient
  • Compaction baseline recovery: Tokens preserved via baseline columns (tokens not lost to compaction)
  • Output/input ratio: Balanced ratio indicates good prompt efficiency

5. Cost Optimization Opportunities

  • Sessions where cache_write >> cache_read (poor cache reuse)
  • Expensive models used for simple tasks (check subagent_type vs model)
  • Sessions with many compactions (context overflow = wasted tokens)

Output

Structured Markdown with tables. Currency as USD to 4 decimal places. Include total and per-model subtotals.

Related skills