EAAPL-AGT001Proven↓ Risk signal

16 signals↑

Single Agent Pattern

Agentic AIAPRA CPS234EU AI Act↑ 1 signals · Q2 2026

[EAAPL-AGT001] Single Agent Pattern

Category: Agentic AI Sub-category: Foundational Agent Architecture Version: 2.1 Maturity: Proven Tags: agent-loop, tool-use, memory, state-management, autonomy, safety-constraints Regulatory Relevance: EU AI Act (Art. 9, 14, 17), ISO 42001 §6.1, NIST AI RMF (GOVERN 1.1, MANAGE 2.2), APRA CPS234

1. Executive Summary

The Single Agent Pattern defines the canonical architecture for a single autonomous AI agent capable of perceiving its environment, planning multi-step responses, executing tool-mediated actions, and reflecting on outcomes to improve subsequent iterations. It serves as the foundational building block upon which all multi-agent architectures are composed.

For CIO/CTO audiences: this pattern codifies how your organisation deploys an AI system that operates with limited human intervention across a bounded task domain — processing customer enquiries, executing data pipelines, triaging incidents, or drafting structured documents. It defines the safety rails, operational controls, and governance artefacts required before a single agent can be trusted in production.

The pattern addresses the three most common failure modes organisations encounter when first deploying autonomous agents: runaway execution (no termination conditions), scope creep (no tool boundaries), and unauditable decisions (no structured logging). When implemented correctly, this pattern delivers a measurable reduction in task cycle time (typically 40–70%) while maintaining compliance with enterprise risk frameworks. Every multi-agent architecture in this library extends from or composes this pattern; get this right first.

2. Problem Statement

Business Problem

Organisations need to automate complex, multi-step knowledge work tasks that previously required human judgment — ticket resolution, document review, research synthesis, code generation — without deploying a human for each invocation. Existing RPA and rule-based automation cannot handle the variability and reasoning demands of these tasks. Generative AI models alone are stateless and cannot take actions beyond generating text.

Technical Problem

A large language model (LLM) in isolation is a stateless text-transformation function. It has no persistent memory, cannot call external APIs, cannot read or write data stores, and cannot self-correct. Connecting an LLM to tools without a structured execution loop creates an unpredictable, unauditable system with no defined termination semantics.

Symptoms of Absence

Tasks requiring tool use are handled by brittle prompt chains with no error recovery
Each tool integration is ad hoc, with no unified invocation or audit surface
Agents run indefinitely with no token budget or step limit enforcement
Errors propagate silently; the agent hallucinates success
No replay or forensic capability when something goes wrong
Compliance teams cannot produce evidence of what the agent decided or why

Cost of Inaction

Financial: Each unstructured agent deployment creates isolated technical debt; re-architecture costs grow super-linearly with scale
Risk: An agent without safety constraints can execute irreversible actions (deleting records, sending emails, initiating payments) incorrectly
Regulatory: APRA CPS234 requires demonstrable control over information-asset-affecting automated systems; absence of audit logs creates material findings
Competitive: Organisations that standardise on this pattern deploy new agent capabilities in days rather than months

3. Context

When to Apply

Automating a bounded, well-defined knowledge work task that requires ≥2 sequential tool calls
The task domain has a clear completion criterion (file produced, ticket closed, confirmation received)
A single reasoning model is sufficient; no specialised sub-agent is needed
Task duration is ≤30 minutes per invocation (see EAAPL-AGT007 for longer tasks)
The organisation is establishing its first production AI agent and needs a reference implementation

When NOT to Apply

Task requires simultaneous specialised expertise in multiple domains (use EAAPL-MAG001 Multi-Agent Orchestration)
Task duration routinely exceeds 30 minutes (use EAAPL-AGT007 Long-Running Agent)
Human approval is required at every step (use EAAPL-MAG003 Human-in-the-Loop Agent)
Task involves emergent coordination among peers (use EAAPL-MAG004 Agent Swarm)
The "agent" is purely reactive with no planning step (a simple function call suffices)

Prerequisites

A capable foundation model or fine-tuned model with function-calling support
A tool registry with at least one tool integration (see EAAPL-AGT003)
Observability infrastructure: structured logging, distributed tracing, metrics
A secrets management system (no credentials in agent context)
An identity system for the agent (see EAAPL-AGT009)

Industry Applicability

Industry	Use Case	Risk Level	Maturity
Financial Services	Credit memo drafting, regulatory query response, reconciliation triage	High	Proven
Healthcare	Clinical note summarisation, appointment scheduling, prior auth drafting	Very High	Emerging
Retail / E-commerce	Order exception handling, returns processing, product description generation	Medium	Proven
Legal / Professional Services	Contract clause extraction, due diligence research, citation verification	High	Proven
Technology / SaaS	Incident triage, code review, test generation, documentation synthesis	Medium	Proven
Government	Benefits eligibility pre-screening, FOI response drafting	High	Emerging

4. Architecture Overview

The Single Agent Pattern structures agent execution around four phases that execute in a repeating loop: Observe, Plan, Act, and Reflect. Each phase has explicit inputs, outputs, and guard conditions. The loop terminates when a completion criterion is satisfied, a safety limit is reached, or a human intervenes.

Why a loop rather than a single inference call? Real-world tasks require iterative refinement. A single LLM invocation cannot both plan a multi-step approach and execute it; the model must observe the result of each action before determining the next. The loop externalises this iteration explicitly, making it auditable and controllable.

Observe Phase The agent receives a task instruction and assembles its current context: the task description, any relevant memory (in-context history, retrieved episodic memories, semantic search results from the knowledge store), and the current state of the world (tool results from the previous iteration). This phase is read-only; no mutations occur. The context window is deliberately bounded here — the Context Assembler enforces a token budget, retrieving only the most relevant memory chunks rather than injecting the full history, which would exhaust the context window on long tasks.

Plan Phase The LLM receives the assembled context and produces a structured plan: either a tool call specification (name, arguments, rationale) or a final answer. The planner does not execute actions directly; it emits a structured intent. This separation is critical for governance — the intent can be logged, validated against a policy engine, and subjected to human approval before execution. The plan may be a chain-of-thought reasoning trace followed by a specific tool invocation or a final synthesis.

Act Phase The Tool Dispatcher receives the plan's tool call specification, resolves the tool from the registry, validates inputs against the tool's JSON Schema, enforces access controls, and invokes the tool within an isolated sandbox (see EAAPL-AGT004). The result — success payload or structured error — is returned to the loop. Critically, all tool invocations are written to an immutable audit log before execution (intent record) and after execution (result record). This two-phase logging enables forensic reconstruction of the agent's full action history.

Reflect Phase After receiving a tool result, the agent evaluates whether the result advances the task goal. For agents configured with a self-critique capability (see EAAPL-AGT006), the reflection step can invoke an additional LLM call that evaluates quality before proceeding. In the baseline single-agent pattern, reflection is implemented as a structured prompt instruction that asks the model to assess whether the task is complete or whether another iteration is required.

Termination Logic The loop terminates on any of: (a) the model emits a final answer with no pending tool calls, (b) the step limit is reached (default: 20 iterations), (c) the token budget is exhausted, (d) a safety constraint is violated, or (e) a human sends a stop signal. Termination on any condition other than (a) produces a partial result with a structured status code — the calling system can then decide whether to retry, escalate, or discard.

Memory Architecture The agent maintains four memory tiers: in-context memory (the current conversation and tool results, constrained to the context window), episodic memory (a durable store of past task executions, retrieved by semantic similarity), semantic memory (a vector store of domain knowledge and past learnings), and a procedural skill store (cached tool call patterns for known task types). Memory consolidation — writing key learnings from a completed task back into episodic and semantic stores — runs asynchronously after task completion to avoid blocking the main loop.

State Management The agent's execution state is serialised at each loop iteration checkpoint (see EAAPL-AGT005), enabling recovery from mid-task failures without replaying actions that have already succeeded. The state object includes the task ID, current iteration number, tool call history, partial results, and memory references.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Input["Task Input"] A[Task Request] B[(Memory Store)] end subgraph Core["Observe-Plan-Act-Reflect Loop"] C[Context Assembler] D[LLM Planner] E[Policy Engine] F[Tool Dispatcher] end subgraph Output["Output Layer"] G[Final Output] H[(Audit Log)] end A --> C B -->|retrieve context| C C --> D D -->|structured intent| E E -->|approved| F E -->|blocked| H F -->|sandboxed execution| F F -->|tool result| D D -->|complete| G F --> H G -->|write learnings| B style A fill:#dbeafe,stroke:#3b82f6 style B fill:#fef9c3,stroke:#eab308 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#d1fae5,stroke:#10b981 style H fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Context Assembler	Orchestration	Assembles LLM input from task, memory, and state; enforces token budget	LangChain, LlamaIndex, custom Python/TS	Critical
LLM Planner	AI Model	Generates structured plans and tool call intents from context	GPT-4o, Claude 3.5+, Gemini 1.5 Pro, Llama 3.1 70B (self-hosted)	Critical
Policy Engine	Safety/Governance	Validates tool call intent against allow-list, risk rules, and scope constraints	OPA (Rego), custom rule engine, AWS Bedrock Guardrails	Critical
Tool Dispatcher	Orchestration	Resolves tools from registry, validates inputs, manages invocation lifecycle	LangChain Tools, Semantic Kernel, custom dispatcher	Critical
Tool Registry	Service Catalogue	Stores tool definitions, capability metadata, health status, access controls	Redis, PostgreSQL, service mesh sidecar	High
Sandboxed Executor	Compute Isolation	Executes tool code in isolated environment with resource quotas	Docker + seccomp, Firecracker microVMs, AWS Lambda	Critical
Memory Store — Episodic	Persistence	Stores conversation history and past task executions for retrieval	PostgreSQL + pgvector, MongoDB, Cosmos DB	High
Memory Store — Semantic	Vector Store	Stores domain knowledge embeddings for semantic retrieval	Pinecone, Weaviate, Azure AI Search, pgvector	High
Procedural Skill Library	Knowledge Base	Cached tool-call patterns for known task classes	Object store (S3/Blob), Redis JSON	Medium
Result Handler	Orchestration	Normalises tool results, handles errors, feeds back into loop state	Custom, part of agent framework	High
Termination Controller	Safety	Enforces step limits, token budgets, time limits; triggers partial result	Custom, integrated into agent loop	Critical
Audit Log	Compliance	Immutable append-only log of all plans, tool calls, and results	AWS CloudTrail, Azure Monitor Logs, Kafka + S3 Iceberg	Critical
Observability Platform	Operations	Metrics, traces, and logs for agent execution monitoring	Datadog, OpenTelemetry + Grafana, Azure Monitor	High
Cost/Token Budget Monitor	Governance	Tracks token consumption and cost per task; triggers kill switch	Custom + LLM provider usage APIs	High

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Calling System	Submits task with task_id, instruction, context hints, priority	Task object in queue
2	Context Assembler	Retrieves episodic memories by semantic similarity to task; loads in-context history; applies token budget	Assembled context document
3	LLM Planner	Receives context; generates chain-of-thought reasoning; emits structured tool call intent or final answer	Tool call spec: `{tool_name, arguments, rationale}` or `{final_answer, ...}`
4	Policy Engine	Validates tool call against: (a) agent permission scope, (b) argument safety rules, (c) rate limits	Approved intent or `{blocked, reason, policy_id}`
5	Tool Dispatcher	Resolves tool definition from registry; validates arguments against JSON Schema; checks tool health	Validated invocation request
6	Sandboxed Executor	Executes tool in isolated container with enforced CPU/memory/time/network quotas	Raw tool result or timeout/error
7	Result Handler	Normalises result; writes to in-context window; appends to audit log	Structured result in agent state
8	Termination Controller	Evaluates completion criteria, step count, token budget, time elapsed	Continue / Complete / Abort signal
9	LLM Planner (next iter)	If continuing: receives updated context including tool result; generates next plan	Next tool call or final answer
10	Reflect + Consolidate	On completion: LLM generates memory consolidation summary; writes to episodic + semantic stores	Memory records; final output to caller

Error Flow

Error Condition	Detection Point	Recovery Action	Escalation
Tool invocation timeout	Sandboxed Executor	Retry with exponential backoff (max 3 attempts); on exhaustion return structured error to loop	Log + alert if retry exhausted
Tool returns structured error	Result Handler	Pass error as observation to LLM; allow model to try alternative tool or revise plan	Escalate if same error recurs 3 times
Policy engine blocks intent	Policy Engine	Return block reason to loop as observation; LLM may reformulate	Human escalation queue if blocked 2+ times
Token budget exhausted	Termination Controller	Emit partial result with `status: token_budget_exhausted`; log consumption	Alert cost owner; trigger budget review
LLM returns malformed tool call	Tool Dispatcher	Structured prompt to LLM requesting reformatted call; max 2 retries	Abort task with `status: parse_failure`
Memory store unavailable	Context Assembler	Proceed with in-context memory only; log degraded mode	Alert; trigger memory store recovery SLA

8. Security Considerations

Authentication and Authorisation

The agent must authenticate with a workload identity (not a human user credential) — see EAAPL-AGT009
Each agent instance is assigned a scoped service account with least-privilege permissions
Tool invocations are authorised against the agent's permission scope, not inherited from the calling user
Dynamic permission escalation requires out-of-band human approval via the approval queue

Secrets Management

Zero secrets in agent context, prompt, or tool arguments
All credentials are retrieved at invocation time from a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault)
Secrets are injected as environment variables into the sandboxed executor — never passed through the LLM
Secret rotation must not require agent redeployment; use dynamic secrets where possible

Data Classification

Task input is classified on ingestion; the classification label constrains which tools can be called and which memory tiers can store results
PII must be detected (using a PII detection service) and either masked or flagged before entering the LLM context
Agent outputs are labelled with the highest classification of any input or tool result consumed

Encryption

All data at rest (episodic store, semantic store, audit log) is encrypted with customer-managed keys (CMK)
All data in transit uses TLS 1.3 minimum
Tool result payloads are encrypted in the in-context window when the session spans multiple processes

Auditability

Every LLM inference call is logged with: timestamp, model ID, token counts, full prompt hash (not the prompt itself — to protect sensitive data), intent, and result classification
All tool calls are logged pre- and post-execution with: tool ID, version, input hash, output hash, latency, and success/failure
Audit logs are immutable (WORM) and retained per regulatory schedule (minimum 7 years for financial services)

OWASP LLM Top 10 Mitigations

OWASP LLM Risk	Applicability	Mitigation in This Pattern
LLM01 Prompt Injection	Critical	Input sanitisation layer on all user-supplied content; system prompt hardened and separated from user prompt; tool call arguments validated by JSON Schema before execution
LLM02 Insecure Output Handling	High	All LLM outputs are parsed as structured JSON before action; free-text outputs are never directly executed as code
LLM03 Training Data Poisoning	Medium	Model provenance tracked; fine-tuned models require security review of training data lineage
LLM04 Model Denial of Service	High	Token budget enforced per task; max concurrent agent instances enforced; inference cost anomaly detection alerts
LLM05 Supply Chain Vulnerabilities	High	Model artefacts verified via hash/signature; dependency SBOM for all agent framework packages; private model registry
LLM06 Sensitive Information Disclosure	Critical	PII detection on context assembly; output filtering before returning to caller; data classification enforcement
LLM07 Insecure Plugin Design	Critical	Tool registry enforces JSON Schema validation; all tools developed against hardened interface contract; no arbitrary code execution via tool arguments
LLM08 Excessive Agency	Critical	Tool allow-list per agent role; step limit (default 20) and token budget (default 50K); policy engine blocks out-of-scope actions; irreversible actions require human approval; all actions logged and alertable
LLM09 Overreliance	High	Confidence scores surfaced to callers; human review gate for high-stakes outputs; output metadata includes model uncertainty indicators
LLM10 Model Theft	Medium	Inference endpoints behind private networking; prompt content protected by access controls; no raw prompt logging in shared log stores

9. Governance Considerations

Responsible AI

The agent must have a documented purpose statement, capability boundary, and prohibited use list before production deployment
Outputs must be traceable to specific model version, tools used, and input context — enabling explanation of any decision
Bias monitoring is required for agents that influence decisions about individuals (see EU AI Act Article 9 risk assessment requirement)

Model Risk Management

The agent's foundation model is treated as a material third-party model component under model risk management frameworks
Model change management: any model version upgrade requires regression testing on a validated task benchmark before production promotion
Model performance drift is monitored via weekly quality evaluations on a held-out task sample

Human Approval Gates

Irreversible actions (deletes, financial transactions, communications to external parties) require explicit human approval via the approval queue before execution, regardless of agent confidence
When the agent's confidence score falls below the configured threshold, the task is paused and routed to the human escalation queue with full context

Policy Enforcement

Agents operate within a declarative policy document (OPA Rego) that specifies: permitted tools, permitted data domains, permitted output channels, prohibited keywords, and escalation conditions
Policy updates require a change management review cycle; agents automatically reload policies without redeployment

Traceability

Every agent task produces a structured execution trace: a directed acyclic graph (DAG) of observations, plans, tool calls, and results with timestamps
Execution traces are stored in the audit log and can be replayed in a simulation environment for forensic investigation

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Agent Purpose Statement	Product Owner	At creation; on material change	Defines permitted scope, audience, and prohibited uses
Risk Assessment (EU AI Act Art. 9)	AI Risk Officer	At creation; annually	Documents risk classification, controls, and residual risk acceptance
Tool Capability Register	Platform Engineer	On each tool change	Lists all tools, their capabilities, data access, and risk classification
Execution Trace Archive	Platform Engineering	Continuous	Immutable record of all agent decisions for audit and forensics
Model Performance Report	ML Engineer	Weekly	Quality metrics on task benchmark; drift detection
Cost Consumption Report	FinOps	Weekly	Token usage, cost per task type, anomaly flags
Human Override Log	Operations	Continuous	Record of all human interventions, approvals, and overrides

10. Operational Considerations

Monitoring

Agent health is defined by four golden signals: throughput (tasks/hour), latency (p50/p95/p99 end-to-end task duration), error rate (tool failures + policy blocks + aborted tasks), and cost per task
Distributed tracing (OpenTelemetry) spans the full agent loop: each iteration of the loop is a child span of the parent task span
Structured JSON logs for every loop iteration, tool call, and memory operation — queryable in the observability platform

SLOs

SLO	Target	Measurement Window	Alert Threshold
Task completion rate	≥ 95%	24-hour rolling	< 92% triggers P2 alert
p95 task latency	≤ 30s (interactive) / ≤ 5min (async)	1-hour rolling	> 150% of target triggers P2
Policy violation rate	≤ 1% of task attempts	24-hour rolling	> 2% triggers P1 review
Audit log availability	99.99%	Monthly	Any gap triggers P0 incident
Memory retrieval latency	≤ 200ms p95	1-hour rolling	> 500ms triggers P3 alert

Logging Requirements

Log levels: INFO for each loop iteration summary; DEBUG for full prompt/response (gated by feature flag; disabled in production by default); ERROR for tool failures and policy blocks; AUDIT for all tool executions (always on)
Log correlation: every log line includes task_id, agent_id, iteration_number, and trace_id
Log retention: AUDIT logs 7 years; operational logs 90 days

Incident Response

Incident Type	Detection	Response	Escalation Path
Agent executing prohibited action	Policy engine block + alert	Automatic task abort; human review of policy gap	AI Risk Officer within 4 hours
Runaway agent (step limit hit)	Termination Controller	Automatic abort; partial result returned	Engineering on-call; root cause within 24h
Memory store unavailable	Heartbeat check	Degrade gracefully to in-context only; alert	Infra on-call; full restoration SLA 1 hour
LLM provider outage	Task failure spike	Failover to secondary model if configured; queue tasks	Engineering on-call; business notification
Audit log gap detected	Log integrity monitor	P0 incident; suspend affected agent	CISO + Legal notification within 1 hour

Capacity Planning

Agent instances are stateless (state is externalised to stores); horizontal scaling is straightforward
Token throughput is the primary capacity constraint; plan for 2× peak observed demand in LLM quota allocation
Memory store IOPS scales with concurrent agents × memory retrievals per iteration; provision accordingly
Sandbox executor pool: minimum 2× peak concurrent tasks to absorb burst without queuing

11. Cost Considerations

Cost Drivers

Cost Driver	Description	Scaling Behaviour	Control Lever
LLM inference tokens	Input + output tokens per loop iteration × iterations per task	Linear with task complexity and iterations	Token budget per task; model tier selection
Embedding API calls	Semantic memory retrieval generates embeddings for each query	Linear with tasks + unique queries	Embedding cache; batching
Sandbox compute	CPU/memory seconds per tool execution	Linear with tool call volume and complexity	Resource quotas; warm pool management
Memory store operations	Vector search, episodic read/write per task	Linear with tasks; sublinear with caching	Retrieval cache TTL; memory consolidation batching
Observability ingestion	Log + trace + metric volume	Linear with tasks and log verbosity	Log sampling (non-audit); metric aggregation
Audit log storage	Immutable log retention per regulatory schedule	Linear with task volume; accumulates over retention period	Compression; tiered storage (hot/cold/archive)

Scaling Risks

Token cost scales quadratically if the context assembler does not enforce the token budget strictly — longer context → more expensive inference → more iterations → compounding cost
Memory store costs compound if memory consolidation is not pruning stale or low-value entries

Optimisations

Use a smaller, faster model for planning steps and the full model only for complex reasoning or final synthesis (model routing)
Implement a task-level response cache: if the same task instruction has been seen before, return the cached result after human validation
Batch memory consolidation writes to reduce vector store write IOPS costs
Right-size sandbox containers per tool type: code execution needs more resources than API calls

Indicative Cost Range (USD, per 1,000 tasks)

Configuration	Model Tier	Est. Tokens/Task	Sandbox Cost	Total/1K Tasks
Low complexity (≤5 tool calls)	GPT-4o-mini / Claude Haiku	~5,000	~$0.10	~$3–8
Medium complexity (5–15 tool calls)	GPT-4o / Claude Sonnet	~25,000	~$0.50	~$20–60
High complexity (15–20 tool calls)	GPT-4o / Claude Opus	~80,000	~$2.00	~$120–300

Costs vary significantly by provider, region, and negotiated rates. Model pricing changes frequently — validate against current provider pricing.

12. Trade-Off Analysis

Architecture Options

Option	Description	Pros	Cons	Best For
A: ReAct Loop (Recommended)	Alternating Reasoning + Action steps in a single-model loop with structured tool calls	Auditable, predictable, well-supported by major frameworks; clean separation of planning and execution	Multiple LLM calls per task increases latency and cost; requires structured output support from model	Production deployments where auditability and control are paramount
B: Plan-and-Execute	Single planning call generates the full multi-step plan; executor runs steps sequentially without re-planning	Fewer LLM calls; lower latency for well-defined tasks; plan is fully auditable upfront	Cannot adapt to intermediate results; brittle when tool calls fail or return unexpected data	Tasks with highly predictable structure and reliable tools
C: Code-as-Plan (LLM generates executable code)	LLM generates Python/JS code that implements the full task; code is executed in a sandbox	Maximum flexibility; LLM's reasoning is embedded in code structure	Extremely high security risk if sandbox is not hardened; debugging is difficult; audit is harder	Research / experimental contexts with very strong sandboxing
D: Minimal Stateless Agent	No persistent memory; no episodic store; single-shot with tool calls	Simplest to implement and reason about; lowest cost; easiest to secure	Cannot learn from past tasks; poor performance on tasks that benefit from prior context	Commodity, repetitive tasks with no personalisation requirement

Architectural Tensions

Tension	Left Pole	Right Pole	Recommended Balance
Autonomy vs. Control	Fully autonomous — no human gates; maximum throughput	Human approval on every action; maximum safety	Risk-tiered gates: irreversible actions always require approval; reversible low-risk actions are autonomous
Context richness vs. Cost	Inject maximum context for best model performance	Minimal context for lowest token cost	Semantic retrieval of top-K relevant chunks; tiered memory with recency + relevance scoring
Model capability vs. Latency	Largest most capable model for best quality	Smallest fastest model for best UX	Model routing: small model for planning, large model for synthesis; or step complexity detection
Tool breadth vs. Attack surface	Expose all available tools for maximum agent capability	Minimal tool set to reduce attack surface	Scope tools to task domain; policy engine enforces at runtime; review tool set quarterly

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Prompt injection via tool result	Medium	Critical — agent executes attacker-controlled instructions	Output validation on all tool results before re-injection into context; anomaly detection on plan changes	Abort task; log injection attempt; alert security team; sanitise source tool
Infinite loop (agent cannot make progress)	Medium	High — resource exhaustion; SLA breach	Step counter; stuck-detection (same tool + same args called 3× consecutively)	Termination Controller enforces step limit; return partial result
Memory store poisoning	Low	High — corrupted memory degrades all future tasks using that memory	Memory integrity checksums; outlier detection on retrieved memory quality scores	Quarantine suspect memory entries; rebuild from audit log
Tool credential leak via LLM output	Low	Critical — credentials exposed in agent response to caller	Output filtering for credential patterns (regex + entropy detection); no credential injection into context	Immediate secret rotation; incident declaration; audit log review
Model hallucination of tool argument	High	Medium — tool call fails with validation error; loop recovers	JSON Schema validation at dispatcher catches invalid arguments	Structured retry with explicit error fed back to LLM; max 2 retries
Sandbox escape	Very Low	Critical — arbitrary code execution on host	Seccomp profiles; eBPF syscall monitoring; network egress whitelisting	Immediate container kill; host isolation; P0 incident
Cost runaway	Medium	High — unbudgeted cloud spend	Cost Budget Monitor with per-task and per-day ceilings	Hard kill switch when daily budget exceeded; alert FinOps

Cascading Failure Scenarios

Memory Store Corruption → Degraded Context → Hallucination Cascade: If poisoned memories are retrieved at scale, multiple concurrent agents receive corrupted context, leading to a correlated hallucination event. Mitigation: memory store is write-audited; corruption triggers circuit breaker that switches all agents to zero-memory mode.
LLM Provider Brownout → Retry Storm → Quota Exhaustion: Partial LLM failures cause agents to retry, exhausting quota for all agents simultaneously. Mitigation: circuit breaker on LLM provider client; exponential backoff with jitter; queue-based retry with rate limiting.

14. Regulatory Considerations

APRA CPS 230 (Operational Resilience)

The agent is classified as a material business service component if it supports critical operations; requires BIA, RTO/RPO, and annual testing
Third-party LLM providers are material third-party service providers; APRA notification and contract requirements apply
Audit log retention and integrity controls satisfy the operational records requirement

APRA CPS 234 (Information Security)

Agent workload identity and tool access controls implement the information asset protection requirement
Sandboxed execution and network egress controls satisfy the capability containment requirement
Incident detection and response controls (Section 10) map to CPS 234 notification obligations

Privacy Act 1988 (Australia) / GDPR (EU)

PII detection and masking before LLM ingestion is mandatory
Episodic memory storing PII must honour subject access requests and right to erasure; memory purge procedure required
Cross-border data transfer restrictions apply to cloud-hosted inference and memory stores; data residency controls are required

EU AI Act

If the agent supports a high-risk use case (Art. 6, Annex III — e.g., employment, credit, healthcare), full Art. 9–17 obligations apply: risk management system, data governance, technical documentation, human oversight, accuracy/robustness requirements, and transparency to affected persons
Art. 14 (Human Oversight): the human approval gate for irreversible actions directly implements this requirement; must be documented in the technical file
Art. 17 (Quality Management): model risk management artefacts and performance monitoring satisfy this requirement
General Purpose AI provisions (Art. 51–55) apply if the foundation model is provided to third parties

ISO 42001 (AI Management System)

This pattern's governance artefacts (Section 9) map directly to ISO 42001 §6.1 (risk assessment) and §8.4 (AI system lifecycle)
Agent purpose statement satisfies the intended use documentation requirement
Execution trace archive satisfies the operational records requirement (§9.1)

NIST AI RMF

GOVERN: Agent purpose statement + risk assessment maps to GOVERN 1.1, 1.2
MAP: Industry applicability table + failure mode analysis maps to MAP 1.5, 2.2
MEASURE: Model performance report + SLO monitoring maps to MEASURE 1.1, 2.5
MANAGE: Human approval gates + incident response maps to MANAGE 2.2, 4.1

15. Reference Implementations

AWS

Component	AWS Service
LLM Inference	Amazon Bedrock (Claude, Titan, Llama)
Agent Orchestration	Amazon Bedrock Agents
Tool Execution Sandbox	AWS Lambda (isolated function per tool) + Firecracker
Episodic / Semantic Memory	Amazon OpenSearch Service (k-NN) or Aurora PostgreSQL + pgvector
Audit Log	AWS CloudTrail + S3 with Object Lock (WORM)
Secrets	AWS Secrets Manager
Policy Engine	AWS Verified Access + custom Lambda authoriser
Observability	AWS X-Ray + CloudWatch + Amazon Managed Grafana
Cost Monitor	AWS Cost Explorer API + custom Lambda alerting

Azure

Component	Azure Service
LLM Inference	Azure OpenAI Service (GPT-4o, o-series)
Agent Orchestration	Azure AI Foundry Agent Service + Semantic Kernel
Tool Execution Sandbox	Azure Container Instances (per-invocation) + Azure Functions
Episodic / Semantic Memory	Azure AI Search (vector) + Azure Cosmos DB (episodic)
Audit Log	Azure Monitor Logs + Azure Immutable Blob Storage
Secrets	Azure Key Vault
Policy Engine	Azure Policy + custom APIM policy
Observability	Azure Monitor + Application Insights + OpenTelemetry

GCP

Component	GCP Service
LLM Inference	Vertex AI (Gemini 1.5 Pro / Flash)
Agent Orchestration	Vertex AI Agent Builder
Tool Execution Sandbox	Cloud Run (per-request isolation) + gVisor
Episodic / Semantic Memory	Vertex AI Vector Search + Cloud Spanner (episodic)
Audit Log	Cloud Audit Logs + Cloud Storage (WORM bucket)
Secrets	Secret Manager
Observability	Cloud Trace + Cloud Monitoring + Managed Prometheus

On-Premises / Private Cloud

Component	Technology
LLM Inference	Ollama + Llama 3.1 70B / Mistral Large on GPU cluster; or vLLM serving
Agent Orchestration	LangGraph or custom Python agent loop
Tool Execution Sandbox	Kubernetes with gVisor runtime class + NetworkPolicy
Semantic Memory	Weaviate or Qdrant on Kubernetes
Audit Log	Apache Kafka (event log) + Apache Iceberg on MinIO
Secrets	HashiCorp Vault
Observability	OpenTelemetry Collector + Grafana + Loki + Tempo

Pattern	ID	Relationship Type	Notes
Stateful Agent Memory	EAAPL-AGT002	Extends	This pattern references AGT002 for all memory tier implementations
Agent Tool Registry	EAAPL-AGT003	Depends On	Tool Dispatcher relies on the registry defined in AGT003
Agent Sandboxing	EAAPL-AGT004	Depends On	Sandboxed Executor implements the isolation defined in AGT004
Agent Checkpoint and Recovery	EAAPL-AGT005	Integrates With	State serialisation and recovery extends the loop defined here
Reflexive Agent	EAAPL-AGT006	Extends	Reflexive agent adds a self-critique step to the Reflect phase of this loop
Long-Running Agent	EAAPL-AGT007	Extends	Long-running agent externalises the loop state and adds async management
Event-Driven Agent	EAAPL-AGT008	Extends	Event-driven agent wraps this pattern with an event trigger and back-pressure layer
Agent Identity and Authorisation	EAAPL-AGT009	Depends On	Agent workload identity is a prerequisite for the Policy Engine
Agent Cost Governance	EAAPL-AGT010	Integrates With	Cost monitor is a component of the governance plane
Multi-Agent Orchestration	EAAPL-MAG001	Composes	Orchestrator deploys multiple instances of this pattern as worker agents
Supervisor Agent	EAAPL-MAG002	Composes	Worker agents are instances of this pattern
Human-in-the-Loop Agent	EAAPL-MAG003	Extends	Adds mandatory human approval queue to the Act phase of this pattern

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Evidence
Adoption Breadth	5	Deployed by all major hyperscalers; basis for LangChain, Semantic Kernel, Bedrock Agents, Vertex AI Agent Builder
Tooling Ecosystem	5	Mature open-source frameworks (LangGraph, AutoGen, CrewAI); managed cloud services available
Operational Patterns	4	SRE runbooks established; observability tooling mature; DR patterns well understood
Security Hardening	4	OWASP LLM mitigations defined; sandbox technology mature; some novel attacks (indirect injection) still emerging
Regulatory Clarity	3	EU AI Act framework established; APRA guidance published; implementation guidance still evolving
Community Knowledge	5	Extensive published case studies, benchmarks, and failure post-mortems

18. Revision History

Version	Date	Author	Changes
1.0	2024-03-01	Architecture Board	Initial pattern publication
1.1	2024-06-15	AI Platform Team	Added OWASP LLM Top 10 section; updated tool registry reference
2.0	2024-11-01	Architecture Board	Major revision: added four-tier memory model; restructured data flow; added EU AI Act Art. 14 detail
2.1	2025-02-10	AI Risk Team	Added APRA CPS230 mapping; updated cost table; added cascading failure scenarios

Track this pattern for APRA/ASIC review

← Back to Library More Agentic AI →