Cost Management

AgenFleet gives you visibility and control over AI costs at every level — per agent, per job, and across the entire fleet. Token usage is billed directly by your AI provider (e.g., Anthropic) to your account — AgenFleet tracks consumption so you can optimize it, but does not charge for tokens itself.

This page covers how to monitor token usage, set guardrails, and reduce spend without sacrificing output quality.

How costs are calculated

Every interaction with an agent consumes tokens — the unit of measure for LLM usage. Costs are based on:

Input tokens — everything sent to the model: the system prompt (SOUL file), session history, memory results, and the current message
Output tokens — the model’s response

Each model has a different per-token price. See Models & Fallbacks for current rates.

What drives high cost:

Long sessions with extensive history (high input token count per turn)
Frequent cron jobs on expensive models (Sonnet/Opus instead of Haiku)
Agents with large SOUL files or many extraPaths files loaded per turn
High topK memory search results injected per turn

Cost visibility

Per-agent cost

Every agent detail view shows:

Today’s token usage — input + output, with cost estimate
7-day chart — daily spend trend
Monthly projection — estimated full-month cost at current run rate
Cost per session — average tokens per conversation turn

Fleet-level cost

The Fleet dashboard header aggregates across all agents:

Combined daily token consumption
Estimated monthly cost
Top spenders (agents ranked by token consumption)

Per-run cost

The Activity tab logs token usage for every cron job execution and session turn. You can see exactly how much a specific briefing or report cost to generate.

Setting budgets

Budgets are configured in the agent’s limits block:

"limits": {
  "dailyTokenBudget": 300000,
  "monthlyTokenBudget": 6000000,
  "alertThreshold": 0.8
}

How limits work:

When alertThreshold is reached (e.g., 80% of daily budget), an alert is sent to your notification channel
When the daily budget is exhausted, the agent pauses — it will not process new messages or cron jobs for the rest of the day
Budget resets at midnight UTC (daily) and on the 1st of each month (monthly)
Paused agents resume automatically when the budget period resets

The Cost Optimizer

The Cost Optimizer is accessible from the Fleet dashboard sidebar. It analyzes your fleet and surfaces specific recommendations to reduce spend:

What it looks for

High-cost agents on expensive models — agents running Sonnet or Opus that could potentially run on Haiku for their task type. The optimizer flags these with an estimated monthly savings if downgraded.

Low-activity agents — agents with minimal task volume relative to their standing costs. Candidates for consolidation or deactivation.

Session bloat — agents with sessions that have accumulated thousands of messages. Each turn on a bloated session costs more due to larger context window usage. The optimizer shows how much you’d save by pruning them.

High topK memory settings — agents injecting 10+ memory results per turn when 3–5 is typically sufficient.

Applying recommendations

Each recommendation in the Cost Optimizer is actionable — clicking Apply makes the change directly. Changes to model and topK take effect on the next session turn. Session pruning is immediate.

All changes are logged in the activity audit trail.

Cost optimization strategies

Match model to task complexity

The single highest-leverage decision. A daily news briefing on Haiku vs. Opus is the same quality output at ~60x lower cost for this type of routine task.

Task type	Recommended model
Daily/weekly briefings	Haiku
Research and analysis	Sonnet
Complex multi-step reasoning	Opus
Simple monitoring/alerting	Haiku
Client-facing chat	Sonnet

Prune sessions regularly

Set a monthly calendar reminder to prune sessions older than 30 days. This is the easiest recurring action to reduce cost and improve agent responsiveness simultaneously.

Right-size memory injection

If an agent rarely needs historical context (e.g., a daily news briefing that’s always fresh), set topK to 2–3 instead of the default 5. Each injected memory result adds input tokens.

Shorten SOUL files

A 3,000-word SOUL file costs more per turn than a 800-word one. Move static reference material (data tables, glossaries) to extraPaths with selective injection rather than loading it every turn.

Consolidate low-activity agents

If you have two agents each running 2 cron jobs per week, consider merging them into one agent with 4 jobs. You save the standing overhead of a second container and a second set of session context.

Understanding your AI spend

Because token costs flow directly to your provider account, AgenFleet’s billing dashboard gives you a full breakdown to help you understand and attribute spend — but charges appear on your Anthropic (or other provider) invoice, not your AgenFleet invoice.

The usage breakdown includes:

Token usage by agent
Model mix (Haiku / Sonnet / Opus split)
Total input vs. output tokens
Estimated cost at current provider rates

Your AgenFleet subscription covers platform access, infrastructure, fleet management, and support. See your plan details under Settings → Billing.