How do I track LLM costs per skill in my agent?

Add an onUsage callback to your LLM service that receives token counts after each API call. Before calling the LLM, set the current skill name and user on the router. The callback logs a JSONL entry with model, skill, user, token counts, cached tokens, and computed cost. Aggregate by skill for reporting.

How do I calculate cost for Anthropic API calls with prompt caching?

Subtract cached tokens from total input tokens to get uncached input. Multiply uncached input by the model's input price per million tokens. Multiply cached tokens by the cached input price (typically 10% of input price). Add output tokens times output price. The formula: (uncachedInput / 1M * inputPrice) + (cachedTokens / 1M * cachedPrice) + (outputTokens / 1M * outputPrice).

What format should I use for LLM cost logs?

JSONL (one JSON object per line) in monthly files. Each line contains timestamp, model, skill, user, input/output/cached token counts, computed cost, and route (which LLM provider). JSONL is append-only, grep-friendly, and trivial to parse for aggregation. Monthly files keep individual file sizes manageable.

Should I track local model usage even though it's free?

Yes. Track Ollama and other local model calls at $0 cost. This lets you measure what percentage of your agent's work stays local vs goes to paid APIs. If your local model handles 60% of requests, that's a real cost savings number you can report.

Building Per-Skill LLM Cost Tracking Into Your Agent

You know your monthly API bill. You don’t know which skill is responsible for most of it. Is it the research skill that calls Opus? The chat handler using Sonnet? The background task running Haiku? Without per-skill attribution, you can’t optimize — you’re just watching a number go up.

We added cost tracking to our agent runtime in about an hour. It logs every API call with the skill that triggered it, computes the cost including cached token discounts, and generates per-skill breakdowns on demand. Here’s the pattern.

The Architecture

Three components:

Pricing table — model names → cost per million tokens (input, output, cached input)
Usage callback — fires after every API call with token counts and current skill context
Cost logger — appends JSONL per month, aggregates for reporting

LLM API call
    ↓
Response includes usage (input/output/cached tokens)
    ↓
onUsage callback fires
    ↓
calculateCost() computes dollar amount
    ↓
costLogger.log() appends to JSONL
    ↓
costLogger.summary() aggregates for reporting

The Pricing Table

A simple object mapping model IDs to per-million-token prices. Include cached input as a separate rate — it’s typically 10% of the input price.

// pricing.js

export const MODEL_PRICING = {
  // Anthropic
  'claude-sonnet-4-20250514':    { input: 3.00, output: 15.00, cached_input: 0.30 },
  'claude-haiku-3-5-20241022':   { input: 0.80, output: 4.00,  cached_input: 0.08 },

  // Local models — free, but tracked for attribution
  'llama3.3':   { input: 0, output: 0, cached_input: 0 },
  'qwen2.5':    { input: 0, output: 0, cached_input: 0 },
};

export function calculateCost(model, inputTokens, outputTokens, cachedTokens = 0) {
  const pricing = MODEL_PRICING[model];
  if (!pricing) return 0;

  const uncachedInput = inputTokens - cachedTokens;
  return (
    (uncachedInput / 1_000_000) * pricing.input +
    (cachedTokens / 1_000_000) * pricing.cached_input +
    (outputTokens / 1_000_000) * pricing.output
  );
}

When Anthropic changes pricing or you add a new model, update one object. The cached_input field is the key detail most implementations miss — without it, you’re overestimating costs every time the prompt cache hits.

The Usage Callback

After every API call, the LLM service emits usage data. Wire a callback that captures the current skill context:

// In your LLM service (e.g., claude.js)

_emitUsage(model, response) {
  const cached = response.usage.cache_read_input_tokens || 0;
  const cacheCreated = response.usage.cache_creation_input_tokens || 0;

  console.log(
    `Claude: ${response.usage.input_tokens} in ` +
    `(${cached} cached, ${cacheCreated} cache-created) / ` +
    `${response.usage.output_tokens} out`
  );

  if (this.onUsage) {
    this.onUsage({
      model,
      inputTokens: response.usage.input_tokens,
      outputTokens: response.usage.output_tokens,
      cachedTokens: cached,
    });
  }
}

Call _emitUsage() after every client.messages.create() response — both in your chat() and chatWithTools() methods.

Wiring the Callback

In your agent’s main initialization, connect the LLM service to the cost logger:

// index.js — agent startup

import { CostLogger } from './utils/cost-logger.js';
import { calculateCost } from './utils/pricing.js';

const costLogger = new CostLogger('./data/costs');

claude.onUsage = (usage) => {
  costLogger.log({
    timestamp: new Date().toISOString(),
    model: usage.model,
    skill: commandRouter._currentSkill || 'chat',
    user: commandRouter._currentUser || 'unknown',
    inputTokens: usage.inputTokens,
    outputTokens: usage.outputTokens,
    cachedTokens: usage.cachedTokens,
    cost: calculateCost(
      usage.model,
      usage.inputTokens,
      usage.outputTokens,
      usage.cachedTokens
    ),
    route: 'claude',
  });
};

The critical piece: commandRouter._currentSkill is set before the LLM call, in your command handling logic:

// In your command router, before calling the LLM
const skillName = skill?.name || 'chat';
this._currentSkill = skillName;
this._currentUser = sender;

This gives every API call a skill tag. Without it, you just have a flat list of costs with no attribution.

The Cost Logger

Append-only JSONL, one file per month. Simple to write, simple to grep, simple to aggregate.

// cost-logger.js

import { appendFileSync, readFileSync, existsSync, mkdirSync } from 'fs';
import { join } from 'path';

export class CostLogger {
  constructor(dir) {
    this.dir = dir;
    if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
  }

  _filePath() {
    const month = new Date().toISOString().slice(0, 7); // "2026-04"
    return join(this.dir, `${month}.jsonl`);
  }

  log(entry) {
    appendFileSync(this._filePath(), JSON.stringify(entry) + '\n');
  }

  summary(month) {
    const file = month
      ? join(this.dir, `${month}.jsonl`)
      : this._filePath();

    if (!existsSync(file)) return { total: 0, bySkill: {}, byModel: {} };

    const lines = readFileSync(file, 'utf-8').trim().split('\n');
    const bySkill = {};
    const byModel = {};
    let total = 0;

    for (const line of lines) {
      const entry = JSON.parse(line);
      total += entry.cost;

      bySkill[entry.skill] = (bySkill[entry.skill] || 0) + entry.cost;
      byModel[entry.model] = (byModel[entry.model] || 0) + entry.cost;
    }

    return { total, bySkill, byModel };
  }

  formatSummary(month) {
    const { total, bySkill, byModel } = this.summary(month);

    const skillLines = Object.entries(bySkill)
      .sort(([, a], [, b]) => b - a)
      .map(([skill, cost]) => `  ${skill}: $${cost.toFixed(4)}`)
      .join('\n');

    const modelLines = Object.entries(byModel)
      .sort(([, a], [, b]) => b - a)
      .map(([model, cost]) => `  ${model}: $${cost.toFixed(4)}`)
      .join('\n');

    return [
      `Total: $${total.toFixed(4)}`,
      `\nBy skill:\n${skillLines}`,
      `\nBy model:\n${modelLines}`,
    ].join('\n');
  }
}

Tracking Local Models

If your agent uses Ollama or another local LLM for cheap tasks, track those calls at $0. The pricing table already has entries for local models with zero cost. This lets you answer a question that matters: “What percentage of my agent’s work stays local?”

Wire the same callback pattern for your Ollama service:

ollama.onUsage = (usage) => {
  costLogger.log({
    timestamp: new Date().toISOString(),
    model: usage.model,
    skill: commandRouter._currentSkill || 'chat',
    user: commandRouter._currentUser || 'unknown',
    inputTokens: usage.inputTokens,
    outputTokens: usage.outputTokens,
    cachedTokens: 0,
    cost: 0,
    route: 'ollama',
  });
};

Now your summary shows: “60% of calls went to Ollama at $0, 40% went to Claude at $X.” That’s a real number for your local-vs-cloud optimization decisions.

Surfacing Costs

Make it queryable. Add a command your agent responds to:

if (command === '/costs') {
  const summary = costLogger.formatSummary();
  return summary;
}

Output:

Total: $2.4731

By skill:
  research: $1.4200
  morning-brief: $0.3840
  chat: $0.3100
  task-manager: $0.2191
  email-draft: $0.1400

By model:
  claude-sonnet-4-20250514: $2.1031
  claude-haiku-3-5-20241022: $0.3700
  llama3.3: $0.0000

Now you know: the research skill is 57% of your spend. That’s where optimization effort should go — maybe use Haiku for the initial search and Sonnet only for synthesis. You couldn’t make that decision without per-skill attribution.

What a Log Entry Looks Like

{"timestamp":"2026-04-04T14:23:17.042Z","model":"claude-sonnet-4-20250514","skill":"morning-brief","user":"adam","inputTokens":5200,"outputTokens":890,"cachedTokens":4000,"cost":0.0171,"route":"claude"}

One line. All the context. Grep for a skill, a user, a model, or a date range. Parse with jq for custom aggregations. No database needed.

Key Takeaway

Instrument cost tracking from day one. The implementation is an hour of work — a pricing table, a usage callback, and a JSONL logger. The insight it gives you is permanent: which skills are expensive, which models are overkill for which tasks, and what your agent actually costs to run. Total spend is a number. Per-skill spend is a decision-making tool.