Cost Management
AgenFleet gives you visibility and control over AI costs at every level — per agent, per job, and across the entire fleet. Token usage is billed directly by your AI provider (e.g., Anthropic) to your account — AgenFleet tracks consumption so you can optimize it, but does not charge for tokens itself.
This page covers how to monitor token usage, set guardrails, and reduce spend without sacrificing output quality.
How costs are calculated
Section titled “How costs are calculated”Every interaction with an agent consumes tokens — the unit of measure for LLM usage. Costs are based on:
- Input tokens — everything sent to the model: the system prompt (SOUL file), session history, memory results, and the current message
- Output tokens — the model’s response
Each model has a different per-token price. See Models & Fallbacks for current rates.
What drives high cost:
- Long sessions with extensive history (high input token count per turn)
- Frequent cron jobs on expensive models (Sonnet/Opus instead of Haiku)
- Agents with large SOUL files or many
extraPathsfiles loaded per turn - High
topKmemory search results injected per turn
Cost visibility
Section titled “Cost visibility”Per-agent cost
Section titled “Per-agent cost”Every agent detail view shows:
- Today’s token usage — input + output, with cost estimate
- 7-day chart — daily spend trend
- Monthly projection — estimated full-month cost at current run rate
- Cost per session — average tokens per conversation turn
Fleet-level cost
Section titled “Fleet-level cost”The Fleet dashboard header aggregates across all agents:
- Combined daily token consumption
- Estimated monthly cost
- Top spenders (agents ranked by token consumption)
Per-run cost
Section titled “Per-run cost”The Activity tab logs token usage for every cron job execution and session turn. You can see exactly how much a specific briefing or report cost to generate.
Setting budgets
Section titled “Setting budgets”Budgets are configured in the agent’s limits block:
"limits": { "dailyTokenBudget": 300000, "monthlyTokenBudget": 6000000, "alertThreshold": 0.8}How limits work:
- When
alertThresholdis reached (e.g., 80% of daily budget), an alert is sent to your notification channel - When the daily budget is exhausted, the agent pauses — it will not process new messages or cron jobs for the rest of the day
- Budget resets at midnight UTC (daily) and on the 1st of each month (monthly)
- Paused agents resume automatically when the budget period resets
The Cost Optimizer
Section titled “The Cost Optimizer”The Cost Optimizer is accessible from the Fleet dashboard sidebar. It analyzes your fleet and surfaces specific recommendations to reduce spend:
What it looks for
Section titled “What it looks for”High-cost agents on expensive models — agents running Sonnet or Opus that could potentially run on Haiku for their task type. The optimizer flags these with an estimated monthly savings if downgraded.
Low-activity agents — agents with minimal task volume relative to their standing costs. Candidates for consolidation or deactivation.
Session bloat — agents with sessions that have accumulated thousands of messages. Each turn on a bloated session costs more due to larger context window usage. The optimizer shows how much you’d save by pruning them.
High topK memory settings — agents injecting 10+ memory results per turn when 3–5 is typically sufficient.
Applying recommendations
Section titled “Applying recommendations”Each recommendation in the Cost Optimizer is actionable — clicking Apply makes the change directly. Changes to model and topK take effect on the next session turn. Session pruning is immediate.
All changes are logged in the activity audit trail.
Cost optimization strategies
Section titled “Cost optimization strategies”Match model to task complexity
Section titled “Match model to task complexity”The single highest-leverage decision. A daily news briefing on Haiku vs. Opus is the same quality output at ~60x lower cost for this type of routine task.
| Task type | Recommended model |
|---|---|
| Daily/weekly briefings | Haiku |
| Research and analysis | Sonnet |
| Complex multi-step reasoning | Opus |
| Simple monitoring/alerting | Haiku |
| Client-facing chat | Sonnet |
Prune sessions regularly
Section titled “Prune sessions regularly”Set a monthly calendar reminder to prune sessions older than 30 days. This is the easiest recurring action to reduce cost and improve agent responsiveness simultaneously.
Right-size memory injection
Section titled “Right-size memory injection”If an agent rarely needs historical context (e.g., a daily news briefing that’s always fresh), set topK to 2–3 instead of the default 5. Each injected memory result adds input tokens.
Shorten SOUL files
Section titled “Shorten SOUL files”A 3,000-word SOUL file costs more per turn than a 800-word one. Move static reference material (data tables, glossaries) to extraPaths with selective injection rather than loading it every turn.
Consolidate low-activity agents
Section titled “Consolidate low-activity agents”If you have two agents each running 2 cron jobs per week, consider merging them into one agent with 4 jobs. You save the standing overhead of a second container and a second set of session context.
Understanding your AI spend
Section titled “Understanding your AI spend”Because token costs flow directly to your provider account, AgenFleet’s billing dashboard gives you a full breakdown to help you understand and attribute spend — but charges appear on your Anthropic (or other provider) invoice, not your AgenFleet invoice.
The usage breakdown includes:
- Token usage by agent
- Model mix (Haiku / Sonnet / Opus split)
- Total input vs. output tokens
- Estimated cost at current provider rates
Your AgenFleet subscription covers platform access, infrastructure, fleet management, and support. See your plan details under Settings → Billing.