You know your monthly API bill. You don’t know which skill is responsible for most of it. Is it the research skill that calls Opus? The chat handler using Sonnet? The background task running Haiku? Without per-skill attribution, you can’t optimize — you’re just watching a number go up.
We added cost tracking to our agent runtime in about an hour. It logs every API call with the skill that triggered it, computes the cost including cached token discounts, and generates per-skill breakdowns on demand. Here’s the pattern.
The Architecture
Three components:
- Pricing table — model names → cost per million tokens (input, output, cached input)
- Usage callback — fires after every API call with token counts and current skill context
- Cost logger — appends JSONL per month, aggregates for reporting
LLM API call
↓
Response includes usage (input/output/cached tokens)
↓
onUsage callback fires
↓
calculateCost() computes dollar amount
↓
costLogger.log() appends to JSONL
↓
costLogger.summary() aggregates for reporting
The Pricing Table
A simple object mapping model IDs to per-million-token prices. Include cached input as a separate rate — it’s typically 10% of the input price.
// pricing.js
export const MODEL_PRICING = {
// Anthropic
'claude-sonnet-4-20250514': { input: 3.00, output: 15.00, cached_input: 0.30 },
'claude-haiku-3-5-20241022': { input: 0.80, output: 4.00, cached_input: 0.08 },
// Local models — free, but tracked for attribution
'llama3.3': { input: 0, output: 0, cached_input: 0 },
'qwen2.5': { input: 0, output: 0, cached_input: 0 },
};
export function calculateCost(model, inputTokens, outputTokens, cachedTokens = 0) {
const pricing = MODEL_PRICING[model];
if (!pricing) return 0;
const uncachedInput = inputTokens - cachedTokens;
return (
(uncachedInput / 1_000_000) * pricing.input +
(cachedTokens / 1_000_000) * pricing.cached_input +
(outputTokens / 1_000_000) * pricing.output
);
}
When Anthropic changes pricing or you add a new model, update one object. The cached_input field is the key detail most implementations miss — without it, you’re overestimating costs every time the prompt cache hits.
The Usage Callback
After every API call, the LLM service emits usage data. Wire a callback that captures the current skill context:
// In your LLM service (e.g., claude.js)
_emitUsage(model, response) {
const cached = response.usage.cache_read_input_tokens || 0;
const cacheCreated = response.usage.cache_creation_input_tokens || 0;
console.log(
`Claude: ${response.usage.input_tokens} in ` +
`(${cached} cached, ${cacheCreated} cache-created) / ` +
`${response.usage.output_tokens} out`
);
if (this.onUsage) {
this.onUsage({
model,
inputTokens: response.usage.input_tokens,
outputTokens: response.usage.output_tokens,
cachedTokens: cached,
});
}
}
Call _emitUsage() after every client.messages.create() response — both in your chat() and chatWithTools() methods.
Wiring the Callback
In your agent’s main initialization, connect the LLM service to the cost logger:
// index.js — agent startup
import { CostLogger } from './utils/cost-logger.js';
import { calculateCost } from './utils/pricing.js';
const costLogger = new CostLogger('./data/costs');
claude.onUsage = (usage) => {
costLogger.log({
timestamp: new Date().toISOString(),
model: usage.model,
skill: commandRouter._currentSkill || 'chat',
user: commandRouter._currentUser || 'unknown',
inputTokens: usage.inputTokens,
outputTokens: usage.outputTokens,
cachedTokens: usage.cachedTokens,
cost: calculateCost(
usage.model,
usage.inputTokens,
usage.outputTokens,
usage.cachedTokens
),
route: 'claude',
});
};
The critical piece: commandRouter._currentSkill is set before the LLM call, in your command handling logic:
// In your command router, before calling the LLM
const skillName = skill?.name || 'chat';
this._currentSkill = skillName;
this._currentUser = sender;
This gives every API call a skill tag. Without it, you just have a flat list of costs with no attribution.
The Cost Logger
Append-only JSONL, one file per month. Simple to write, simple to grep, simple to aggregate.
// cost-logger.js
import { appendFileSync, readFileSync, existsSync, mkdirSync } from 'fs';
import { join } from 'path';
export class CostLogger {
constructor(dir) {
this.dir = dir;
if (!existsSync(dir)) mkdirSync(dir, { recursive: true });
}
_filePath() {
const month = new Date().toISOString().slice(0, 7); // "2026-04"
return join(this.dir, `${month}.jsonl`);
}
log(entry) {
appendFileSync(this._filePath(), JSON.stringify(entry) + '\n');
}
summary(month) {
const file = month
? join(this.dir, `${month}.jsonl`)
: this._filePath();
if (!existsSync(file)) return { total: 0, bySkill: {}, byModel: {} };
const lines = readFileSync(file, 'utf-8').trim().split('\n');
const bySkill = {};
const byModel = {};
let total = 0;
for (const line of lines) {
const entry = JSON.parse(line);
total += entry.cost;
bySkill[entry.skill] = (bySkill[entry.skill] || 0) + entry.cost;
byModel[entry.model] = (byModel[entry.model] || 0) + entry.cost;
}
return { total, bySkill, byModel };
}
formatSummary(month) {
const { total, bySkill, byModel } = this.summary(month);
const skillLines = Object.entries(bySkill)
.sort(([, a], [, b]) => b - a)
.map(([skill, cost]) => ` ${skill}: $${cost.toFixed(4)}`)
.join('\n');
const modelLines = Object.entries(byModel)
.sort(([, a], [, b]) => b - a)
.map(([model, cost]) => ` ${model}: $${cost.toFixed(4)}`)
.join('\n');
return [
`Total: $${total.toFixed(4)}`,
`\nBy skill:\n${skillLines}`,
`\nBy model:\n${modelLines}`,
].join('\n');
}
}
Tracking Local Models
If your agent uses Ollama or another local LLM for cheap tasks, track those calls at $0. The pricing table already has entries for local models with zero cost. This lets you answer a question that matters: “What percentage of my agent’s work stays local?”
Wire the same callback pattern for your Ollama service:
ollama.onUsage = (usage) => {
costLogger.log({
timestamp: new Date().toISOString(),
model: usage.model,
skill: commandRouter._currentSkill || 'chat',
user: commandRouter._currentUser || 'unknown',
inputTokens: usage.inputTokens,
outputTokens: usage.outputTokens,
cachedTokens: 0,
cost: 0,
route: 'ollama',
});
};
Now your summary shows: “60% of calls went to Ollama at $0, 40% went to Claude at $X.” That’s a real number for your local-vs-cloud optimization decisions.
Surfacing Costs
Make it queryable. Add a command your agent responds to:
if (command === '/costs') {
const summary = costLogger.formatSummary();
return summary;
}
Output:
Total: $2.4731
By skill:
research: $1.4200
morning-brief: $0.3840
chat: $0.3100
task-manager: $0.2191
email-draft: $0.1400
By model:
claude-sonnet-4-20250514: $2.1031
claude-haiku-3-5-20241022: $0.3700
llama3.3: $0.0000
Now you know: the research skill is 57% of your spend. That’s where optimization effort should go — maybe use Haiku for the initial search and Sonnet only for synthesis. You couldn’t make that decision without per-skill attribution.
What a Log Entry Looks Like
{"timestamp":"2026-04-04T14:23:17.042Z","model":"claude-sonnet-4-20250514","skill":"morning-brief","user":"adam","inputTokens":5200,"outputTokens":890,"cachedTokens":4000,"cost":0.0171,"route":"claude"}
One line. All the context. Grep for a skill, a user, a model, or a date range. Parse with jq for custom aggregations. No database needed.
Key Takeaway
Instrument cost tracking from day one. The implementation is an hour of work — a pricing table, a usage callback, and a JSONL logger. The insight it gives you is permanent: which skills are expensive, which models are overkill for which tasks, and what your agent actually costs to run. Total spend is a number. Per-skill spend is a decision-making tool.