Proven

EAAPL-AGT010 — AI Agent Cost Governance Architecture

Name: EAAPL Pattern Library
Creator: Enterprise AI Architecture Pattern Library
License: https://aipatterns.com.au/terms

Agentic AI

EAAPL-AGT010 — AI Agent Cost Governance Architecture

Status: Proven
Tags: agent cost-optimisation observability llm medium-complexity
Version: 1.1
Last Updated: 2026-06-12
Author: Enterprise AI Architecture Pattern Library

1. Executive Summary

Agentic AI systems that autonomously execute multi-step workflows—calling tools, querying APIs, writing and executing code, and iterating on results—introduce a category of financial risk that traditional IT cost management does not address: runaway agent execution. A single misconfigured agent task can consume tens of thousands of dollars in LLM API tokens, tool execution costs, and third-party API calls within minutes, with no human in the loop to apply a circuit breaker. Enterprise deployments of AI agents without cost governance controls have experienced individual incidents consuming USD 10,000–100,000 in a single errant run.

This pattern provides a comprehensive cost governance architecture for AI agent deployments. It covers: per-agent token budget enforcement; per-task cost ceilings with hard stops; pre-flight cost estimation before task execution; real-time cost tracking mid-execution; cost anomaly detection with configurable thresholds; kill switch mechanisms for runaway agents; cost chargeback models to business units; cost dashboards per agent type; and cost optimisation strategies including model tier routing, tool result caching, and tool call batching. The architecture is designed to be agent-framework agnostic and is compatible with LangChain, AutoGen, CrewAI, Anthropic's Claude agent SDK, and custom agent implementations.

2. Problem Statement

Business Problem

Without cost governance, AI agent deployments create uncontrolled financial exposure. Business units deploying agents for automation tasks—document processing, research, code generation, customer service—may trigger agent runs that iterate excessively, call expensive tools repeatedly, or enter infinite loops. These incidents are invisible until the cloud billing alert fires at end-of-month. Monthly AI spend becomes unpredictable, chargebacks to business units are impossible to defend, and the CFO has no visibility into per-agent or per-department AI costs.

Technical Problem

LLM APIs price by token, with costs varying significantly by model tier. An agent that uses GPT-4o at USD 10/million output tokens versus GPT-4o-mini at USD 0.60/million can increase costs 16x for the same task if model selection is not governed. Agent loops—where the agent makes tool calls, receives results, reasons about next steps, makes more tool calls—can iterate dozens or hundreds of times on a single task if no termination condition fires. Tool calls themselves (web search APIs, code execution, database queries) may have per-call pricing. The aggregate cost of a complex agentic task is difficult to estimate in advance.

Symptoms

Monthly LLM API spend is 3–5x higher than forecast with no clear explanation
Individual agent runs have occasionally consumed USD 500–5,000 in tokens before timing out
Business unit CTOs cannot explain their AI spend to finance because there is no per-use-case cost attribution
Agents are deployed using the most capable (most expensive) model for all tasks regardless of task complexity
No alerts fire when an agent run is 10x more expensive than average

Cost of Inaction

Dimension	Consequence
Financial	Uncontrolled AI spend; budget overruns; individual runaway incidents costing USD 10K–100K
Operational	Agent tasks compete for token budgets with production workloads; cascading slowdowns
Strategic	Finance refuses to expand AI budget due to lack of cost predictability; AI programme stalls
Governance	Inability to demonstrate ROI per business unit; AI investment decisions based on incomplete data

3. Context

When to Apply

Any production deployment of AI agents that make LLM API calls and/or tool calls
Multi-agent systems where agents spawn sub-agents or parallel agent workflows
Platforms offering self-service agent capabilities to multiple business units or teams
AI agents performing open-ended research, code generation, or document processing tasks where task scope can expand unpredictably

When NOT to Apply

Single-turn LLM API calls (chatbots, simple completions) — token budget enforcement is sufficient; full agent cost governance is over-engineered
Agents running exclusively on fixed-cost infrastructure (self-hosted models with no per-token pricing) — cost ceiling controls are less relevant but monitoring still applies
Development and testing environments — apply lighter-weight controls; save full governance for production

Prerequisites

Prerequisite	Description
LLM API Access with Usage Metering	API provider exposes token usage per call in response headers or logs
Agent Observability Infrastructure	Ability to instrument agent execution with per-step metrics
Cost Allocation Model	Budget owners identified per agent type or business unit
Agent Execution Platform	Agent framework through which governance controls can be injected

Industry Applicability

Industry	Key Agent Cost Risk	Governance Priority
Financial Services	Automated research agents; regulatory document processing	High — budget control + audit trail
Legal / Professional Services	Document review agents; contract analysis	High — per-matter cost attribution
Software Development	Code generation agents; automated PR review	Medium — per-repository cost tracking
Healthcare	Clinical literature review; medical coding agents	High — per-patient/per-task cost
Media and Publishing	Content generation; translation agents	Medium — per-content-item cost
E-commerce	Product description; customer service agents	Medium — per-SKU/per-session cost

4. Architecture Overview

The Agent Cost Governance Architecture implements six layers of cost control that apply at different stages of agent execution.

Layer 1 — Pre-Flight Cost Estimation Before an agent task is submitted for execution, the Pre-Flight Cost Estimator generates an estimated cost range based on: the task type (which has historical cost distributions from completed tasks); the configured model tier; the agent's tool set (each tool has an average cost-per-call from historical data); and the expected number of iterations (based on similar past tasks). The estimate is surfaced to the requesting system or user with three tiers: expected cost, 90th-percentile cost, and maximum permissible cost (the ceiling). If the 90th-percentile estimate exceeds the task's configured cost ceiling, the task is rejected before execution begins with a recommendation to adjust scope or model tier. This prevents obviously over-budget tasks from starting at all.

Layer 2 — Per-Agent Token Budget Enforcement Each agent type has a configured token budget per execution: a combined input+output token limit. The Token Budget Enforcer intercepts every LLM API call made by the agent and maintains a running token counter. Before each LLM call, it checks whether the remaining budget is sufficient for the anticipated call (estimated by the prompt length + a configurable output reserve). If the remaining budget is insufficient, the agent is halted cleanly: a summarisation prompt is injected to generate a partial result from work completed so far, and the execution is terminated with a budget-exhausted status. This prevents gradual token drift from running unchecked across a long agent execution.

Layer 3 — Per-Task Cost Ceiling (Hard Stop) The Per-Task Cost Ceiling operates in addition to the token budget—it tracks total monetary cost across all cost sources: LLM tokens (priced per model tier), tool calls (external API costs), code execution compute, and vector store query costs. Each task has a configurable monetary ceiling. When the real-time cost tracker detects that the cumulative cost has reached the ceiling, a hard stop is triggered: the agent's execution thread receives a termination signal, the partial result is saved, and a cost-limit-exceeded status is recorded. Unlike the token budget (which can be exhausted by a single very large prompt), the monetary ceiling catches runaway costs from repeated tool calls or expensive external API calls even when individual LLM calls are within budget.

Layer 4 — Real-Time Cost Tracking and Anomaly Detection The Cost Monitor aggregates all cost signals from all executing agents in real time: token usage per LLM call (from API response headers), tool call costs (from tool execution middleware), and external API costs (from billing webhooks or mock pricing models). The anomaly detector compares each agent run's cumulative cost against the baseline distribution for that agent type. If the run exceeds 3x the average cost at any point in execution, an anomaly is flagged: an alert is raised to the platform team, and the agent run enters a "watchlist" state where cost is monitored more frequently. If the run reaches a configurable multiple of the ceiling (e.g., 80% of ceiling), a pre-stop warning is sent to the requesting system.

Layer 5 — Kill Switch and Emergency Stop The Kill Switch provides human-initiated emergency stop capability for any running agent task. It is exposed as: an API endpoint accessible to platform operators and privileged users; a UI control in the agent cost dashboard; and an automated trigger when the cost anomaly detector fires at the critical threshold. When the kill switch is activated, the agent's execution context receives a graceful-shutdown signal. The agent is designed to respond to this signal by: completing the current LLM call (or aborting mid-call if above a time threshold); saving all work-in-progress to the task result store; and returning a partial result with a killed status. A hard kill (process termination) is available as a fallback if graceful shutdown does not complete within 30 seconds.

Layer 6 — Cost Attribution and Chargeback The Cost Attribution Engine assigns costs to business units, cost centres, teams, and individual users based on agent execution metadata (agent type, requesting user, department tag, project code). Costs are aggregated in near-real-time into a Cost Dashboard that shows: current month spend by business unit; spend by agent type; top-cost tasks; anomaly history; and budget utilisation vs allocation. Monthly cost allocation reports are generated for finance chargeback. Budget alerts are sent to business unit owners when utilisation reaches configured thresholds (50%, 80%, 100%).

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Request["Task Request Stage"] TASKDEF[Task Definition\nAgent Type + Scope + Model] PREFLIGHT[Pre-Flight Cost Estimator\nEstimate + Ceiling Check] REJECT{Estimate Exceeds\nCeiling?} end subgraph Execution["Agent Execution Stage"] AGENT[Agent Runtime\nLangChain / AutoGen / Custom] TOKENBUDGET[Token Budget Enforcer\nPer-Agent Token Limit] COSTTRACK[Real-Time Cost Tracker\nLLM + Tools + Compute] CEILING[Cost Ceiling Monitor\nHard Stop at Ceiling] end subgraph Tools["Tool Execution Layer"] TOOL1[Tool: Web Search\nCost: ~USD 0.003/call] TOOL2[Tool: Code Execution\nCost: ~USD 0.01/run] TOOL3[Tool: Database Query\nCost: ~USD 0.001/query] TOOLMID[Tool Cost Middleware\nRecord + Accumulate] end subgraph Anomaly["Anomaly Detection + Kill Switch"] ANOMALY[Anomaly Detector\n3x Baseline = Alert] WATCHLIST[Watchlist Monitoring\nIncreased Sampling] KILLSWITCH[Kill Switch\nManual + Automated] RESULT[(Partial Result Store\nSave on Stop)] end subgraph Attribution["Cost Attribution"] ATTRIB[Cost Attribution Engine\nTag by BU / Team / User] DASHBOARD[Cost Dashboard\nSpend by Agent + BU + Task] CHARGEBACK[Monthly Chargeback\nReport for Finance] BUDGETALERT[Budget Alerts\n50% / 80% / 100%] end TASKDEF --> PREFLIGHT PREFLIGHT --> REJECT REJECT -->|Yes| DENIED[Task Rejected\nAdjust Scope or Model] REJECT -->|No| AGENT AGENT --> TOKENBUDGET TOKENBUDGET -->|Budget OK| TOOL1 TOKENBUDGET -->|Budget OK| TOOL2 TOKENBUDGET -->|Budget OK| TOOL3 TOKENBUDGET -->|Budget Exhausted| SUMMARIZE[Inject Summarisation\nReturn Partial Result] TOOL1 --> TOOLMID TOOL2 --> TOOLMID TOOL3 --> TOOLMID TOOLMID --> COSTTRACK AGENT --> COSTTRACK COSTTRACK --> CEILING CEILING -->|Ceiling Hit| KILLSWITCH COSTTRACK --> ANOMALY ANOMALY -->|3x Baseline| WATCHLIST WATCHLIST -->|Critical| KILLSWITCH KILLSWITCH --> RESULT COSTTRACK --> ATTRIB ATTRIB --> DASHBOARD ATTRIB --> CHARGEBACK ATTRIB --> BUDGETALERT

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Pre-Flight Cost Estimator	Processing	Estimate task cost from historical data; reject if over ceiling	Custom service; AWS Lambda; Azure Function	High
Token Budget Enforcer	Processing	Track per-agent token consumption; halt on budget exhaustion	Agent middleware (LangChain callback); custom interceptor	Critical
Real-Time Cost Tracker	Processing	Aggregate costs from LLM + tools + compute in real time	Custom aggregator; Apache Kafka; Redis + atomic counters	Critical
Cost Ceiling Monitor	Processing	Compare running cost against ceiling; trigger hard stop	Custom monitor; Lambda triggered by cost tracker	Critical
Tool Cost Middleware	Processing	Intercept all tool calls; record cost per call; accumulate	LangChain ToolCallbackHandler; custom decorator; OpenTelemetry	High
Anomaly Detector	Analytics	Compare agent run cost against historical baseline; flag 3x outliers	Custom statistical model; AWS CloudWatch Anomaly Detection; Datadog	High
Kill Switch API	Operations	Accept manual or automated kill commands; trigger graceful shutdown	REST API + WebSocket notification to agent runtime	Critical
Partial Result Store	Storage	Preserve work-in-progress on kill or budget exhaustion	S3 + DynamoDB; Azure Blob + Cosmos DB; Redis	High
Cost Attribution Engine	Analytics	Tag costs by business unit, team, user, project code	Custom tagging + aggregation; Databricks; BigQuery	High
Cost Dashboard	Reporting	Real-time spend visualisation by agent type, BU, and task	Grafana; Power BI; Retool; custom React dashboard	Medium
Budget Alert System	Operations	Notify budget owners at 50/80/100% utilisation	Email + Slack; PagerDuty; SNS	Medium
Model Tier Router	Optimisation	Route tasks to lowest-cost model tier capable of the task	Custom capability-cost matrix; LiteLLM; RouteLLM	High
Tool Result Cache	Optimisation	Cache tool call results to avoid repeat API calls	Redis; DynamoDB; Elasticache	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Requesting System	Submit task with: agent type, scope, model preference, cost ceiling	Task definition record
2	Pre-Flight Estimator	Look up historical cost distribution for this agent type and scope	Estimated cost range: expected / P90 / maximum
3	Pre-Flight Gate	Compare P90 estimate against task ceiling; reject or approve	Approved: task execution begins / Rejected: reason returned
4	Model Tier Router	Assign model tier based on task complexity vs cost optimisation policy	Model assignment (e.g., gpt-4o-mini for initial reasoning; gpt-4o only for final synthesis)
5	Agent Runtime	Begin execution; Token Budget Enforcer starts counter	Execution context with budget counter
6	Tool Middleware	Each tool call recorded: tool name, parameters, cost, latency	Tool call record
7	Real-Time Cost Tracker	Aggregate: LLM token cost + tool call costs; update running total	Updated running total per task
8	Anomaly Detector	Compare running total against P99 baseline for this agent type	Alert if 3x baseline exceeded
9	Agent Runtime	Complete task; return result	Task result with full cost record
10	Cost Attribution Engine	Tag final cost by BU, team, user, project; write to cost store	Cost record in attribution store
11	Dashboard	Update real-time cost dashboard	Live cost metrics visible

Error Flow

Step	Failure	Detection	Recovery
Pre-Flight Estimator Cold Start	No historical data for new agent type; cannot estimate	New agent type flag	Apply conservative default ceiling (e.g., USD 5); escalate after first run to set baseline
Token Budget Exhausted	Agent runs out of token budget mid-task	Token counter reaches limit	Inject summarisation prompt; return partial result; log budget-exhausted status
Cost Ceiling Hit	Running cost reaches ceiling before task completion	Cost Monitor ceiling check	Hard stop; save partial result; notify requesting system; alert budget owner
Kill Switch API Unavailable	Cannot terminate runaway agent	Health check on Kill Switch API	Fallback: terminate agent container/process directly; alert platform team
Tool Cost Middleware Miss	Tool call cost not recorded; running total underestimates	Post-execution reconciliation against vendor billing	Reconcile; update cost record; fix middleware

8. Security Considerations

Security Controls

Domain	Control	Implementation	Notes
Authentication	Kill Switch API requires elevated authentication; only platform operators and automated monitors can invoke	OAuth 2.0 + RBAC; API key with rate limit	Prevent malicious halting of legitimate agent tasks
Authorisation	Cost attribution data accessible only to authorised roles per BU; finance has read-all	RBAC on cost dashboard and attribution store	Prevent cross-BU cost data leakage
Secrets	LLM API keys stored in secrets manager; never hardcoded in agent execution context	AWS Secrets Manager; HashiCorp Vault	Prevent API key leakage through agent logs
Auditability	All kill switch invocations logged with initiator identity, reason, and agent state at time of kill	Immutable audit log	Provides investigation trail for anomalous kills
Agent Output Security	Partial results saved on kill may contain sensitive data; access-controlled	Same security controls as full task results

OWASP LLM Top 10 — Cost Governance Interaction

OWASP LLM Risk	Cost Governance Relevance	Control
LLM01 Prompt Injection	Attacker injects prompt causing agent to execute expensive, unnecessary tool calls	Input validation; tool call whitelist; anomaly detection catches cost spike
LLM02 Insecure Output Handling	Agent output passed to another expensive agent in a loop	Output validation; inter-agent cost tracking; circuit breaker on agent chains
LLM03 Training Data Poisoning	Not directly a cost risk	N/A for cost governance
LLM04 Model Denial of Service	Deliberate high-cost task submission to exhaust budget	Pre-flight ceiling; rate limiting on task submission; per-user budget quotas
LLM05 Supply Chain Vulnerabilities	Third-party tool or plugin executes expensive operations unexpectedly	Tool cost middleware captures all tool costs; anomaly detection catches surprises
LLM06 Sensitive Information Disclosure	Expensive vector store queries to find sensitive data exfiltration targets	Cost governance is detective; combine with DLP controls
LLM07 Insecure Plugin Design	Plugin with excessive permissions makes many expensive API calls	Tool call whitelist; per-tool call budget; tool call count limit
LLM08 Excessive Agency	Autonomous agent takes open-ended actions generating unlimited costs	Cost ceiling is the primary control; human approval gate for tasks above threshold
LLM09 Overreliance	User accepts agent output without checking quality; pays for expensive bad results	Not a cost control issue; quality monitoring is separate
LLM10 Model Theft	Not directly a cost risk	N/A for cost governance

9. Governance Considerations

Cost Governance Framework

Domain	Requirement	Owner	Cadence
Budget Allocation	Annual AI agent budget allocated to each business unit	Finance + BU heads	Annual; revised quarterly
Cost Ceiling Policy	Organisation-wide policy on per-task cost ceilings by agent type and risk tier	AI Platform team + Finance	Reviewed quarterly
Anomaly Response Playbook	Documented process for investigating and responding to cost anomalies	AI Platform team	On anomaly
Chargeback Model	Agreed methodology for allocating shared AI platform costs to BUs	Finance	Reviewed annually
Model Tier Policy	Approved model tier assignments by task type	AI Platform team	Reviewed on model price changes

Governance Artefacts

Artefact	Description	Retention
Cost Ceiling Policy	Per-agent-type monetary ceilings and token budgets	Current version + 3 years
Monthly Cost Attribution Reports	Per-BU, per-agent-type, per-user cost summaries for chargeback	7 years
Anomaly Investigation Records	Investigation and root cause for each flagged cost anomaly	3 years
Kill Switch Audit Log	All kill switch invocations with identity, reason, agent state	3 years
Budget Utilisation Reports	Monthly BU budget utilisation vs allocation	7 years

10. Operational Considerations

Monitoring and SLOs

SLO	Target	Measurement	Breach Action
Pre-Flight Estimation Accuracy	P90 estimate within 2x of actual cost	Estimate vs actual comparison after completion	Retrain estimator; adjust confidence multiplier
Runaway Agent Incidents	0 tasks exceed 5x their configured ceiling	Tasks with cost > 5x ceiling / month	Root cause analysis; tighten anomaly threshold
Kill Switch Response Latency	<5 seconds from kill command to agent termination	Kill-to-stop time metric	Investigate agent shutdown logic; fallback to process kill
Cost Dashboard Latency	Cost data visible within 60 seconds of task completion	Dashboard data freshness metric	Investigate ingestion pipeline
Budget Alert Delivery	100% of threshold breaches generate alert within 2 minutes	Alert delivery rate and latency	Investigate alerting pipeline

Disaster Recovery

Scenario	Impact	Recovery
Real-Time Cost Tracker Outage	Cost ceilings cannot be enforced; anomaly detection offline	Halt agent task submission; restore tracker; batch-reconcile costs post-restoration
Kill Switch API Outage	Cannot terminate runaway agents	Fallback: terminate agent container directly via platform API; escalate to on-call
Cost Attribution Store Outage	Costs not attributed during outage period; chargeback gap	Restore from backup; estimate attributions from execution logs

11. Cost Considerations

Cost Optimisation Strategies

Strategy	Description	Estimated Saving	Implementation
Model Tier Routing	Route simple subtasks to cheaper models (GPT-4o-mini, Haiku, Llama)	40–80% cost reduction for mixed-complexity workloads	RouteLLM; LiteLLM; custom capability-cost matrix
Tool Result Caching	Cache tool call results (web search, database queries) for reuse within session	20–50% reduction in tool call costs for repetitive tasks	Redis cache on tool middleware; TTL-based invalidation
Tool Call Batching	Batch multiple tool calls in a single API request where tool supports it	10–30% reduction in per-call overhead	Tool wrapper with batching logic
Prompt Compression	Compress verbose context before injection into LLM prompt	20–40% input token reduction	LLMLingua; selective context; summarisation of prior history
Agent Loop Limit	Hard limit on maximum iterations before forced summarisation	Prevents open-ended iteration loops consuming unlimited tokens	Agent configuration parameter; enforced by framework

Indicative Cost Range

Agent Type	Typical Cost Per Run (Without Governance)	With Governance (Model Routing + Caching)	Notes
Document Summarisation	USD 0.10–2.00	USD 0.05–0.80	Significant savings from model routing
Research Agent (10 web searches)	USD 0.50–5.00	USD 0.20–2.00	Tool caching on repeat searches
Code Generation + Test	USD 1.00–10.00	USD 0.50–4.00	Model routing for planning; premium for generation
Multi-Agent Orchestration (5 agents)	USD 5.00–50.00	USD 2.00–20.00	Cumulative savings across all agents
Runaway Agent (unbounded loop)	USD 100–10,000	USD 0.50–50 (ceiling enforced)	Ceiling is the critical control

12. Trade-Off Analysis

Architecture Options

Option	Description	Pros	Cons	Recommended For
Option A: Budget Enforcer Only	Implement only token budget enforcement; no pre-flight or anomaly detection	Low complexity; fast to implement	No pre-emptive cost estimation; no real-time monetary ceiling	Early-stage agent deployments; low agent task volume
Option B: Full Cost Governance Stack	All six layers: pre-flight, token budget, monetary ceiling, anomaly detection, kill switch, attribution	Maximum cost control; complete visibility	Higher implementation complexity; marginal latency overhead	Production multi-agent platforms; significant AI spend; multiple BUs
Option C: Cloud Provider Native Controls	Use cloud provider billing alerts and budget controls only	Zero implementation cost; no architecture changes	Alerts are after-the-fact (billing cycle lag); cannot stop runaway mid-execution	Acceptable only as a supplement, not a primary control

Architectural Tensions

Tension	Trade-Off	Resolution
Cost Control vs Agent Capability	Tight token budgets and model tier routing may reduce agent task quality	Risk-tier tasks: quality-critical tasks get higher budgets and premium models; routine tasks use optimised settings
Real-Time Control vs Latency	Synchronous cost checks add latency to every LLM call	Async cost tracking with threshold polling; synchronous only at ceiling approach (80% threshold)
Granularity vs Overhead	Very granular per-tool cost tracking creates instrumentation overhead	Profile high-cost tools precisely; estimate low-cost tools with cached average
Pre-Flight Accuracy vs Estimation Speed	Highly accurate estimates require complex models; too slow for interactive tasks	Fast heuristic estimate for interactive tasks; detailed estimate for batch/async tasks

13. Failure Modes

Failure	Likelihood	Impact	Detection	Recovery
Pre-Flight Estimator Underestimates	Medium	High — tasks approved that exceed ceiling; ceiling becomes primary control	Post-completion actual vs estimate comparison	Retrain estimator; lower approval threshold temporarily
Token Budget Enforcer Bypassed	Low	High — unlimited token consumption	Post-execution cost reconciliation	Enforce budget at API gateway level as backup; architecture review
Ceiling Hit During Critical Task	Medium	Medium — partial result returned; task must be retried with higher ceiling	Cost ceiling log; requesting system receives ceiling-exceeded status	Requesting system retries with higher ceiling (manual authorisation); partial result may be usable
Anomaly Detector False Positive	Medium	Low — legitimate expensive task flagged; human review required	Human review clears anomaly	Tune anomaly threshold; add agent type-specific baselines
Runaway Agent in Multi-Agent System	Low	Critical — spawned sub-agents multiply cost	Anomaly detector catches spend spike	Kill parent agent; costs for sub-agents still attributed; root cause

Cascading Failure Scenario

A research agent is deployed to automate competitive intelligence gathering. The agent is configured to call a web search tool and a summarisation tool. The agent prompt contains an instruction to be thorough, which the agent interprets as requiring 50+ web searches per research task (each at USD 0.003). The token budget is set in tokens only; there is no monetary ceiling and no anomaly detector. The agent runs 200 research tasks overnight in batch mode. Each task costs USD 0.50–2.00. The batch costs USD 300. This is above budget but not alarming. Three months later, the agent's prompt is modified to add a competitor product deep-dive sub-task. The modified agent spawns a sub-agent per competitor (10 competitors) per research task. Cost per parent task: USD 25. 200 nightly batch tasks: USD 5,000 per night. Monthly spend: USD 150,000. The first indication is the monthly cloud bill. By the time billing alerts fire, USD 150,000 has been consumed. Remediation: add monetary ceiling; add anomaly detection; require pre-flight approval for tasks estimated above USD 10.

14. Regulatory Considerations

Regulation	Cost Governance Relevance	Architectural Control	Reference
APRA CPS230 — Material Business Services	AI agents embedded in material business services must have cost controls to prevent service disruption	Cost ceilings prevent budget exhaustion that would halt business services	CPS230 operational resilience
SOX (US Public Companies)	AI-generated financial analysis costs must be attributable and auditable for financial controls	Cost attribution with immutable audit trail	SOX Section 302/404
EU AI Act Article 9 — Risk Management	Runaway agent cost is an operational risk that must be managed	Pre-flight estimation + ceiling + kill switch = risk management controls	EU AI Act Article 9
GDPR Article 25 — Privacy by Design	Cost optimisation (prompt compression, caching) must not inadvertently increase privacy risk	Cached tool results must not expose cross-user data; prompt compression must preserve PII controls	GDPR Article 25
Financial Services Regulations (General)	AI operational costs in production financial services workflows require governance and auditability	Chargeback model; cost dashboard; monthly reports	OCC; APRA; FCA guidance on operational risk
ISO 42001 Clause 8 — Operation	AI system operational controls include cost management	Cost governance is an operational control in the AI system lifecycle	ISO/IEC 42001:2023 Clause 8

15. Reference Implementations

AWS

Component	AWS Service
Pre-Flight Cost Estimator	AWS Lambda + DynamoDB (historical cost store)
Token Budget Enforcer	LangChain CallbackHandler deployed in Lambda
Real-Time Cost Tracker	Amazon Kinesis Data Streams + Lambda consumer + ElastiCache Redis
Cost Ceiling Monitor	Lambda triggered by Kinesis; writes to SNS on ceiling approach
Kill Switch API	API Gateway + Lambda; sends SSM Run Command to agent container
Anomaly Detection	Amazon CloudWatch Anomaly Detection on cost metrics
Cost Attribution Store	Amazon DynamoDB with BU/team/user partition keys
Cost Dashboard	Amazon QuickSight; or Grafana + CloudWatch data source
Model Tier Router	Amazon Bedrock model selection + LiteLLM
Tool Result Cache	Amazon ElastiCache Redis

Azure

Component	Azure Service
Pre-Flight Cost Estimator	Azure Function + Cosmos DB
Token Budget Enforcer	Azure Function middleware in agent pipeline
Real-Time Cost Tracker	Azure Event Hubs + Azure Function consumer + Azure Cache for Redis
Kill Switch API	Azure API Management + Azure Container Apps lifecycle API
Anomaly Detection	Azure Monitor Anomaly Detection; or Datadog
Cost Attribution Store	Azure Cosmos DB
Cost Dashboard	Power BI + Azure Monitor data source
Model Tier Router	Azure AI Studio model routing + LiteLLM

GCP

Component	GCP Service
Pre-Flight Cost Estimator	Cloud Functions + Firestore
Token Budget Enforcer	Cloud Functions middleware
Real-Time Cost Tracker	Cloud Pub/Sub + Cloud Functions + Memorystore Redis
Kill Switch API	Cloud Run API + Cloud Run job lifecycle management
Anomaly Detection	Cloud Monitoring alerting policies + custom anomaly model
Cost Attribution Store	BigQuery (cost events table)
Cost Dashboard	Looker + BigQuery
Model Tier Router	Vertex AI model garden routing + LiteLLM

On-Premises / Custom Agent Platforms

Component	Technology
Pre-Flight Cost Estimator	FastAPI service + PostgreSQL historical data
Token Budget Enforcer	Python decorator on agent LLM call method; Redis counter
Real-Time Cost Tracker	Apache Kafka + Flink + Redis
Kill Switch API	FastAPI endpoint; sends SIGTERM to agent process
Anomaly Detection	Prophet or ARIMA on cost time series; custom threshold alerts
Cost Dashboard	Grafana + InfluxDB or PostgreSQL
Model Tier Router	LiteLLM proxy with cost-based routing rules

Pattern ID	Pattern Name	Relationship	Notes
EAAPL-AGT003	Human-in-the-Loop Oversight	COMPLEMENTARY	HITL gates for high-consequence agent actions complement cost ceilings for high-cost actions
EAAPL-AGT007	Multi-Agent Orchestration	PREREQUISITE	Multi-agent orchestration patterns must include per-agent and per-orchestration cost accounting
EAAPL-PLT010	AI Developer Portal	COMPLEMENTARY	Cost dashboards and self-service cost information for developers is part of the developer portal
EAAPL-PLT007	AI Observability Platform	PREREQUISITE	Agent observability infrastructure is required before cost metrics can be collected
EAAPL-CMP002	APRA CPS234 AI Security	COMPLEMENTARY	Runaway agent cost may indicate an adversarial attack (LLM04 DoS); cost anomaly detection provides security signal
EAAPL-AGT001	Agent Execution Framework	PREREQUISITE	An agent execution framework with middleware injection capability is required to implement token budget enforcement

17. Maturity Assessment

Overall Maturity Label: Proven

Dimension	Level 1	Level 2	Level 3	Level 4	Level 5	Current Level
Token Budget Enforcement	No limits	Manual configuration per task	Automated per-agent-type budgets	Dynamic budgets based on task complexity estimation	ML-based adaptive budgets	Level 3
Monetary Ceiling	No ceiling	Manual billing alert only	Automated real-time ceiling with hard stop	Ceiling auto-adjusted based on task priority	Predictive ceiling based on task characteristics	Level 3
Anomaly Detection	No detection	Batch billing alerts	Real-time 3x baseline anomaly detection	Multi-signal anomaly (cost + iteration + latency)	Predictive anomaly before cost spike occurs	Level 3
Cost Attribution	No attribution	BU-level attribution only	Per-agent-type + user attribution	Project/feature-level attribution	Real-time chargeback API for BU budget systems	Level 3
Cost Optimisation	No optimisation	Manual model selection	Model tier routing + tool caching	Dynamic optimisation via cost-quality trade-off model	Continuous optimisation with A/B testing	Level 2–3

18. Revision History

Version	Date	Author	Changes
1.0	2025-07-01	EAAPL Working Group	Initial draft
1.1	2026-06-12	EAAPL Working Group	Added RouteLLM/LiteLLM reference implementations; cascading failure scenario; expanded cost optimisation strategies

Track this pattern for APRA/ASIC review

← Back to Library More Agentic AI →

EAAPL-AGT010 — AI Agent Cost Governance Architecture

EAAPL-AGT010 — AI Agent Cost Governance Architecture

1. Executive Summary

2. Problem Statement

Business Problem

Technical Problem

Symptoms

Cost of Inaction

3. Context

When to Apply

When NOT to Apply

Prerequisites

Industry Applicability

4. Architecture Overview

5. Architecture Diagram

6. Components

7. Data Flow

Primary Flow

Error Flow

8. Security Considerations

Security Controls

OWASP LLM Top 10 — Cost Governance Interaction

9. Governance Considerations

Cost Governance Framework

Governance Artefacts

10. Operational Considerations

Monitoring and SLOs

Disaster Recovery

11. Cost Considerations

Cost Optimisation Strategies

Indicative Cost Range

12. Trade-Off Analysis

Architecture Options

Architectural Tensions

13. Failure Modes

Cascading Failure Scenario

14. Regulatory Considerations

15. Reference Implementations

AWS

Azure

GCP

On-Premises / Custom Agent Platforms

16. Related Patterns

17. Maturity Assessment

18. Revision History