EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic AIEAAPL-AGT001
EAAPL-AGT001Proven↓ Risk signal
⇄ Compare

Single Agent Pattern

🤖 Agentic AIAPRA CPS234EU AI Act1 signals · Q2 2026

[EAAPL-AGT001] Single Agent Pattern

Category: Agentic AI Sub-category: Foundational Agent Architecture Version: 2.1 Maturity: Proven Tags: agent-loop, tool-use, memory, state-management, autonomy, safety-constraints Regulatory Relevance: EU AI Act (Art. 9, 14, 17), ISO 42001 §6.1, NIST AI RMF (GOVERN 1.1, MANAGE 2.2), APRA CPS234


1. Executive Summary

The Single Agent Pattern defines the canonical architecture for a single autonomous AI agent capable of perceiving its environment, planning multi-step responses, executing tool-mediated actions, and reflecting on outcomes to improve subsequent iterations. It serves as the foundational building block upon which all multi-agent architectures are composed.

For CIO/CTO audiences: this pattern codifies how your organisation deploys an AI system that operates with limited human intervention across a bounded task domain — processing customer enquiries, executing data pipelines, triaging incidents, or drafting structured documents. It defines the safety rails, operational controls, and governance artefacts required before a single agent can be trusted in production.

The pattern addresses the three most common failure modes organisations encounter when first deploying autonomous agents: runaway execution (no termination conditions), scope creep (no tool boundaries), and unauditable decisions (no structured logging). When implemented correctly, this pattern delivers a measurable reduction in task cycle time (typically 40–70%) while maintaining compliance with enterprise risk frameworks. Every multi-agent architecture in this library extends from or composes this pattern; get this right first.


2. Problem Statement

Business Problem

Organisations need to automate complex, multi-step knowledge work tasks that previously required human judgment — ticket resolution, document review, research synthesis, code generation — without deploying a human for each invocation. Existing RPA and rule-based automation cannot handle the variability and reasoning demands of these tasks. Generative AI models alone are stateless and cannot take actions beyond generating text.

Technical Problem

A large language model (LLM) in isolation is a stateless text-transformation function. It has no persistent memory, cannot call external APIs, cannot read or write data stores, and cannot self-correct. Connecting an LLM to tools without a structured execution loop creates an unpredictable, unauditable system with no defined termination semantics.

Symptoms of Absence

  • Tasks requiring tool use are handled by brittle prompt chains with no error recovery
  • Each tool integration is ad hoc, with no unified invocation or audit surface
  • Agents run indefinitely with no token budget or step limit enforcement
  • Errors propagate silently; the agent hallucinates success
  • No replay or forensic capability when something goes wrong
  • Compliance teams cannot produce evidence of what the agent decided or why

Cost of Inaction

  • Financial: Each unstructured agent deployment creates isolated technical debt; re-architecture costs grow super-linearly with scale
  • Risk: An agent without safety constraints can execute irreversible actions (deleting records, sending emails, initiating payments) incorrectly
  • Regulatory: APRA CPS234 requires demonstrable control over information-asset-affecting automated systems; absence of audit logs creates material findings
  • Competitive: Organisations that standardise on this pattern deploy new agent capabilities in days rather than months

3. Context

When to Apply

  • Automating a bounded, well-defined knowledge work task that requires ≥2 sequential tool calls
  • The task domain has a clear completion criterion (file produced, ticket closed, confirmation received)
  • A single reasoning model is sufficient; no specialised sub-agent is needed
  • Task duration is ≤30 minutes per invocation (see EAAPL-AGT007 for longer tasks)
  • The organisation is establishing its first production AI agent and needs a reference implementation

When NOT to Apply

  • Task requires simultaneous specialised expertise in multiple domains (use EAAPL-MAG001 Multi-Agent Orchestration)
  • Task duration routinely exceeds 30 minutes (use EAAPL-AGT007 Long-Running Agent)
  • Human approval is required at every step (use EAAPL-MAG003 Human-in-the-Loop Agent)
  • Task involves emergent coordination among peers (use EAAPL-MAG004 Agent Swarm)
  • The "agent" is purely reactive with no planning step (a simple function call suffices)

Prerequisites

  • A capable foundation model or fine-tuned model with function-calling support
  • A tool registry with at least one tool integration (see EAAPL-AGT003)
  • Observability infrastructure: structured logging, distributed tracing, metrics
  • A secrets management system (no credentials in agent context)
  • An identity system for the agent (see EAAPL-AGT009)

Industry Applicability

Industry Use Case Risk Level Maturity
Financial Services Credit memo drafting, regulatory query response, reconciliation triage High Proven
Healthcare Clinical note summarisation, appointment scheduling, prior auth drafting Very High Emerging
Retail / E-commerce Order exception handling, returns processing, product description generation Medium Proven
Legal / Professional Services Contract clause extraction, due diligence research, citation verification High Proven
Technology / SaaS Incident triage, code review, test generation, documentation synthesis Medium Proven
Government Benefits eligibility pre-screening, FOI response drafting High Emerging

4. Architecture Overview

The Single Agent Pattern structures agent execution around four phases that execute in a repeating loop: Observe, Plan, Act, and Reflect. Each phase has explicit inputs, outputs, and guard conditions. The loop terminates when a completion criterion is satisfied, a safety limit is reached, or a human intervenes.

Why a loop rather than a single inference call? Real-world tasks require iterative refinement. A single LLM invocation cannot both plan a multi-step approach and execute it; the model must observe the result of each action before determining the next. The loop externalises this iteration explicitly, making it auditable and controllable.

Observe Phase The agent receives a task instruction and assembles its current context: the task description, any relevant memory (in-context history, retrieved episodic memories, semantic search results from the knowledge store), and the current state of the world (tool results from the previous iteration). This phase is read-only; no mutations occur. The context window is deliberately bounded here — the Context Assembler enforces a token budget, retrieving only the most relevant memory chunks rather than injecting the full history, which would exhaust the context window on long tasks.

Plan Phase The LLM receives the assembled context and produces a structured plan: either a tool call specification (name, arguments, rationale) or a final answer. The planner does not execute actions directly; it emits a structured intent. This separation is critical for governance — the intent can be logged, validated against a policy engine, and subjected to human approval before execution. The plan may be a chain-of-thought reasoning trace followed by a specific tool invocation or a final synthesis.

Act Phase The Tool Dispatcher receives the plan's tool call specification, resolves the tool from the registry, validates inputs against the tool's JSON Schema, enforces access controls, and invokes the tool within an isolated sandbox (see EAAPL-AGT004). The result — success payload or structured error — is returned to the loop. Critically, all tool invocations are written to an immutable audit log before execution (intent record) and after execution (result record). This two-phase logging enables forensic reconstruction of the agent's full action history.

Reflect Phase After receiving a tool result, the agent evaluates whether the result advances the task goal. For agents configured with a self-critique capability (see EAAPL-AGT006), the reflection step can invoke an additional LLM call that evaluates quality before proceeding. In the baseline single-agent pattern, reflection is implemented as a structured prompt instruction that asks the model to assess whether the task is complete or whether another iteration is required.

Termination Logic The loop terminates on any of: (a) the model emits a final answer with no pending tool calls, (b) the step limit is reached (default: 20 iterations), (c) the token budget is exhausted, (d) a safety constraint is violated, or (e) a human sends a stop signal. Termination on any condition other than (a) produces a partial result with a structured status code — the calling system can then decide whether to retry, escalate, or discard.

Memory Architecture The agent maintains four memory tiers: in-context memory (the current conversation and tool results, constrained to the context window), episodic memory (a durable store of past task executions, retrieved by semantic similarity), semantic memory (a vector store of domain knowledge and past learnings), and a procedural skill store (cached tool call patterns for known task types). Memory consolidation — writing key learnings from a completed task back into episodic and semantic stores — runs asynchronously after task completion to avoid blocking the main loop.

State Management The agent's execution state is serialised at each loop iteration checkpoint (see EAAPL-AGT005), enabling recovery from mid-task failures without replaying actions that have already succeeded. The state object includes the task ID, current iteration number, tool call history, partial results, and memory references.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Input["Task Input"] A[Task Request] B[(Memory Store)] end subgraph Core["Observe-Plan-Act-Reflect Loop"] C[Context Assembler] D[LLM Planner] E[Policy Engine] F[Tool Dispatcher] end subgraph Output["Output Layer"] G[Final Output] H[(Audit Log)] end A --> C B -->|retrieve context| C C --> D D -->|structured intent| E E -->|approved| F E -->|blocked| H F -->|sandboxed execution| F F -->|tool result| D D -->|complete| G F --> H G -->|write learnings| B style A fill:#dbeafe,stroke:#3b82f6 style B fill:#fef9c3,stroke:#eab308 style C fill:#f0fdf4,stroke:#22c55e style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#f0fdf4,stroke:#22c55e style G fill:#d1fae5,stroke:#10b981 style H fill:#fef9c3,stroke:#eab308

6. Components

Component Type Responsibility Technology Options Criticality
Context Assembler Orchestration Assembles LLM input from task, memory, and state; enforces token budget LangChain, LlamaIndex, custom Python/TS Critical
LLM Planner AI Model Generates structured plans and tool call intents from context GPT-4o, Claude 3.5+, Gemini 1.5 Pro, Llama 3.1 70B (self-hosted) Critical
Policy Engine Safety/Governance Validates tool call intent against allow-list, risk rules, and scope constraints OPA (Rego), custom rule engine, AWS Bedrock Guardrails Critical
Tool Dispatcher Orchestration Resolves tools from registry, validates inputs, manages invocation lifecycle LangChain Tools, Semantic Kernel, custom dispatcher Critical
Tool Registry Service Catalogue Stores tool definitions, capability metadata, health status, access controls Redis, PostgreSQL, service mesh sidecar High
Sandboxed Executor Compute Isolation Executes tool code in isolated environment with resource quotas Docker + seccomp, Firecracker microVMs, AWS Lambda Critical
Memory Store — Episodic Persistence Stores conversation history and past task executions for retrieval PostgreSQL + pgvector, MongoDB, Cosmos DB High
Memory Store — Semantic Vector Store Stores domain knowledge embeddings for semantic retrieval Pinecone, Weaviate, Azure AI Search, pgvector High
Procedural Skill Library Knowledge Base Cached tool-call patterns for known task classes Object store (S3/Blob), Redis JSON Medium
Result Handler Orchestration Normalises tool results, handles errors, feeds back into loop state Custom, part of agent framework High
Termination Controller Safety Enforces step limits, token budgets, time limits; triggers partial result Custom, integrated into agent loop Critical
Audit Log Compliance Immutable append-only log of all plans, tool calls, and results AWS CloudTrail, Azure Monitor Logs, Kafka + S3 Iceberg Critical
Observability Platform Operations Metrics, traces, and logs for agent execution monitoring Datadog, OpenTelemetry + Grafana, Azure Monitor High
Cost/Token Budget Monitor Governance Tracks token consumption and cost per task; triggers kill switch Custom + LLM provider usage APIs High

7. Data Flow

Primary Flow

Step Actor Action Output
1 Calling System Submits task with task_id, instruction, context hints, priority Task object in queue
2 Context Assembler Retrieves episodic memories by semantic similarity to task; loads in-context history; applies token budget Assembled context document
3 LLM Planner Receives context; generates chain-of-thought reasoning; emits structured tool call intent or final answer Tool call spec: {tool_name, arguments, rationale} or {final_answer, ...}
4 Policy Engine Validates tool call against: (a) agent permission scope, (b) argument safety rules, (c) rate limits Approved intent or {blocked, reason, policy_id}
5 Tool Dispatcher Resolves tool definition from registry; validates arguments against JSON Schema; checks tool health Validated invocation request
6 Sandboxed Executor Executes tool in isolated container with enforced CPU/memory/time/network quotas Raw tool result or timeout/error
7 Result Handler Normalises result; writes to in-context window; appends to audit log Structured result in agent state
8 Termination Controller Evaluates completion criteria, step count, token budget, time elapsed Continue / Complete / Abort signal
9 LLM Planner (next iter) If continuing: receives updated context including tool result; generates next plan Next tool call or final answer
10 Reflect + Consolidate On completion: LLM generates memory consolidation summary; writes to episodic + semantic stores Memory records; final output to caller

Error Flow

Error Condition Detection Point Recovery Action Escalation
Tool invocation timeout Sandboxed Executor Retry with exponential backoff (max 3 attempts); on exhaustion return structured error to loop Log + alert if retry exhausted
Tool returns structured error Result Handler Pass error as observation to LLM; allow model to try alternative tool or revise plan Escalate if same error recurs 3 times
Policy engine blocks intent Policy Engine Return block reason to loop as observation; LLM may reformulate Human escalation queue if blocked 2+ times
Token budget exhausted Termination Controller Emit partial result with status: token_budget_exhausted; log consumption Alert cost owner; trigger budget review
LLM returns malformed tool call Tool Dispatcher Structured prompt to LLM requesting reformatted call; max 2 retries Abort task with status: parse_failure
Memory store unavailable Context Assembler Proceed with in-context memory only; log degraded mode Alert; trigger memory store recovery SLA

8. Security Considerations

Authentication and Authorisation

  • The agent must authenticate with a workload identity (not a human user credential) — see EAAPL-AGT009
  • Each agent instance is assigned a scoped service account with least-privilege permissions
  • Tool invocations are authorised against the agent's permission scope, not inherited from the calling user
  • Dynamic permission escalation requires out-of-band human approval via the approval queue

Secrets Management

  • Zero secrets in agent context, prompt, or tool arguments
  • All credentials are retrieved at invocation time from a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault)
  • Secrets are injected as environment variables into the sandboxed executor — never passed through the LLM
  • Secret rotation must not require agent redeployment; use dynamic secrets where possible

Data Classification

  • Task input is classified on ingestion; the classification label constrains which tools can be called and which memory tiers can store results
  • PII must be detected (using a PII detection service) and either masked or flagged before entering the LLM context
  • Agent outputs are labelled with the highest classification of any input or tool result consumed

Encryption

  • All data at rest (episodic store, semantic store, audit log) is encrypted with customer-managed keys (CMK)
  • All data in transit uses TLS 1.3 minimum
  • Tool result payloads are encrypted in the in-context window when the session spans multiple processes

Auditability

  • Every LLM inference call is logged with: timestamp, model ID, token counts, full prompt hash (not the prompt itself — to protect sensitive data), intent, and result classification
  • All tool calls are logged pre- and post-execution with: tool ID, version, input hash, output hash, latency, and success/failure
  • Audit logs are immutable (WORM) and retained per regulatory schedule (minimum 7 years for financial services)

OWASP LLM Top 10 Mitigations

OWASP LLM Risk Applicability Mitigation in This Pattern
LLM01 Prompt Injection Critical Input sanitisation layer on all user-supplied content; system prompt hardened and separated from user prompt; tool call arguments validated by JSON Schema before execution
LLM02 Insecure Output Handling High All LLM outputs are parsed as structured JSON before action; free-text outputs are never directly executed as code
LLM03 Training Data Poisoning Medium Model provenance tracked; fine-tuned models require security review of training data lineage
LLM04 Model Denial of Service High Token budget enforced per task; max concurrent agent instances enforced; inference cost anomaly detection alerts
LLM05 Supply Chain Vulnerabilities High Model artefacts verified via hash/signature; dependency SBOM for all agent framework packages; private model registry
LLM06 Sensitive Information Disclosure Critical PII detection on context assembly; output filtering before returning to caller; data classification enforcement
LLM07 Insecure Plugin Design Critical Tool registry enforces JSON Schema validation; all tools developed against hardened interface contract; no arbitrary code execution via tool arguments
LLM08 Excessive Agency Critical Tool allow-list per agent role; step limit (default 20) and token budget (default 50K); policy engine blocks out-of-scope actions; irreversible actions require human approval; all actions logged and alertable
LLM09 Overreliance High Confidence scores surfaced to callers; human review gate for high-stakes outputs; output metadata includes model uncertainty indicators
LLM10 Model Theft Medium Inference endpoints behind private networking; prompt content protected by access controls; no raw prompt logging in shared log stores

9. Governance Considerations

Responsible AI

  • The agent must have a documented purpose statement, capability boundary, and prohibited use list before production deployment
  • Outputs must be traceable to specific model version, tools used, and input context — enabling explanation of any decision
  • Bias monitoring is required for agents that influence decisions about individuals (see EU AI Act Article 9 risk assessment requirement)

Model Risk Management

  • The agent's foundation model is treated as a material third-party model component under model risk management frameworks
  • Model change management: any model version upgrade requires regression testing on a validated task benchmark before production promotion
  • Model performance drift is monitored via weekly quality evaluations on a held-out task sample

Human Approval Gates

  • Irreversible actions (deletes, financial transactions, communications to external parties) require explicit human approval via the approval queue before execution, regardless of agent confidence
  • When the agent's confidence score falls below the configured threshold, the task is paused and routed to the human escalation queue with full context

Policy Enforcement

  • Agents operate within a declarative policy document (OPA Rego) that specifies: permitted tools, permitted data domains, permitted output channels, prohibited keywords, and escalation conditions
  • Policy updates require a change management review cycle; agents automatically reload policies without redeployment

Traceability

  • Every agent task produces a structured execution trace: a directed acyclic graph (DAG) of observations, plans, tool calls, and results with timestamps
  • Execution traces are stored in the audit log and can be replayed in a simulation environment for forensic investigation

Governance Artefacts

Artefact Owner Frequency Purpose
Agent Purpose Statement Product Owner At creation; on material change Defines permitted scope, audience, and prohibited uses
Risk Assessment (EU AI Act Art. 9) AI Risk Officer At creation; annually Documents risk classification, controls, and residual risk acceptance
Tool Capability Register Platform Engineer On each tool change Lists all tools, their capabilities, data access, and risk classification
Execution Trace Archive Platform Engineering Continuous Immutable record of all agent decisions for audit and forensics
Model Performance Report ML Engineer Weekly Quality metrics on task benchmark; drift detection
Cost Consumption Report FinOps Weekly Token usage, cost per task type, anomaly flags
Human Override Log Operations Continuous Record of all human interventions, approvals, and overrides

10. Operational Considerations

Monitoring

  • Agent health is defined by four golden signals: throughput (tasks/hour), latency (p50/p95/p99 end-to-end task duration), error rate (tool failures + policy blocks + aborted tasks), and cost per task
  • Distributed tracing (OpenTelemetry) spans the full agent loop: each iteration of the loop is a child span of the parent task span
  • Structured JSON logs for every loop iteration, tool call, and memory operation — queryable in the observability platform

SLOs

SLO Target Measurement Window Alert Threshold
Task completion rate ≥ 95% 24-hour rolling < 92% triggers P2 alert
p95 task latency ≤ 30s (interactive) / ≤ 5min (async) 1-hour rolling > 150% of target triggers P2
Policy violation rate ≤ 1% of task attempts 24-hour rolling > 2% triggers P1 review
Audit log availability 99.99% Monthly Any gap triggers P0 incident
Memory retrieval latency ≤ 200ms p95 1-hour rolling > 500ms triggers P3 alert

Logging Requirements

  • Log levels: INFO for each loop iteration summary; DEBUG for full prompt/response (gated by feature flag; disabled in production by default); ERROR for tool failures and policy blocks; AUDIT for all tool executions (always on)
  • Log correlation: every log line includes task_id, agent_id, iteration_number, and trace_id
  • Log retention: AUDIT logs 7 years; operational logs 90 days

Incident Response

Incident Type Detection Response Escalation Path
Agent executing prohibited action Policy engine block + alert Automatic task abort; human review of policy gap AI Risk Officer within 4 hours
Runaway agent (step limit hit) Termination Controller Automatic abort; partial result returned Engineering on-call; root cause within 24h
Memory store unavailable Heartbeat check Degrade gracefully to in-context only; alert Infra on-call; full restoration SLA 1 hour
LLM provider outage Task failure spike Failover to secondary model if configured; queue tasks Engineering on-call; business notification
Audit log gap detected Log integrity monitor P0 incident; suspend affected agent CISO + Legal notification within 1 hour

Capacity Planning

  • Agent instances are stateless (state is externalised to stores); horizontal scaling is straightforward
  • Token throughput is the primary capacity constraint; plan for 2× peak observed demand in LLM quota allocation
  • Memory store IOPS scales with concurrent agents × memory retrievals per iteration; provision accordingly
  • Sandbox executor pool: minimum 2× peak concurrent tasks to absorb burst without queuing

11. Cost Considerations

Cost Drivers

Cost Driver Description Scaling Behaviour Control Lever
LLM inference tokens Input + output tokens per loop iteration × iterations per task Linear with task complexity and iterations Token budget per task; model tier selection
Embedding API calls Semantic memory retrieval generates embeddings for each query Linear with tasks + unique queries Embedding cache; batching
Sandbox compute CPU/memory seconds per tool execution Linear with tool call volume and complexity Resource quotas; warm pool management
Memory store operations Vector search, episodic read/write per task Linear with tasks; sublinear with caching Retrieval cache TTL; memory consolidation batching
Observability ingestion Log + trace + metric volume Linear with tasks and log verbosity Log sampling (non-audit); metric aggregation
Audit log storage Immutable log retention per regulatory schedule Linear with task volume; accumulates over retention period Compression; tiered storage (hot/cold/archive)

Scaling Risks

  • Token cost scales quadratically if the context assembler does not enforce the token budget strictly — longer context → more expensive inference → more iterations → compounding cost
  • Memory store costs compound if memory consolidation is not pruning stale or low-value entries

Optimisations

  • Use a smaller, faster model for planning steps and the full model only for complex reasoning or final synthesis (model routing)
  • Implement a task-level response cache: if the same task instruction has been seen before, return the cached result after human validation
  • Batch memory consolidation writes to reduce vector store write IOPS costs
  • Right-size sandbox containers per tool type: code execution needs more resources than API calls

Indicative Cost Range (USD, per 1,000 tasks)

Configuration Model Tier Est. Tokens/Task Sandbox Cost Total/1K Tasks
Low complexity (≤5 tool calls) GPT-4o-mini / Claude Haiku ~5,000 ~$0.10 ~$3–8
Medium complexity (5–15 tool calls) GPT-4o / Claude Sonnet ~25,000 ~$0.50 ~$20–60
High complexity (15–20 tool calls) GPT-4o / Claude Opus ~80,000 ~$2.00 ~$120–300

Costs vary significantly by provider, region, and negotiated rates. Model pricing changes frequently — validate against current provider pricing.


12. Trade-Off Analysis

Architecture Options

Option Description Pros Cons Best For
A: ReAct Loop (Recommended) Alternating Reasoning + Action steps in a single-model loop with structured tool calls Auditable, predictable, well-supported by major frameworks; clean separation of planning and execution Multiple LLM calls per task increases latency and cost; requires structured output support from model Production deployments where auditability and control are paramount
B: Plan-and-Execute Single planning call generates the full multi-step plan; executor runs steps sequentially without re-planning Fewer LLM calls; lower latency for well-defined tasks; plan is fully auditable upfront Cannot adapt to intermediate results; brittle when tool calls fail or return unexpected data Tasks with highly predictable structure and reliable tools
C: Code-as-Plan (LLM generates executable code) LLM generates Python/JS code that implements the full task; code is executed in a sandbox Maximum flexibility; LLM's reasoning is embedded in code structure Extremely high security risk if sandbox is not hardened; debugging is difficult; audit is harder Research / experimental contexts with very strong sandboxing
D: Minimal Stateless Agent No persistent memory; no episodic store; single-shot with tool calls Simplest to implement and reason about; lowest cost; easiest to secure Cannot learn from past tasks; poor performance on tasks that benefit from prior context Commodity, repetitive tasks with no personalisation requirement

Architectural Tensions

Tension Left Pole Right Pole Recommended Balance
Autonomy vs. Control Fully autonomous — no human gates; maximum throughput Human approval on every action; maximum safety Risk-tiered gates: irreversible actions always require approval; reversible low-risk actions are autonomous
Context richness vs. Cost Inject maximum context for best model performance Minimal context for lowest token cost Semantic retrieval of top-K relevant chunks; tiered memory with recency + relevance scoring
Model capability vs. Latency Largest most capable model for best quality Smallest fastest model for best UX Model routing: small model for planning, large model for synthesis; or step complexity detection
Tool breadth vs. Attack surface Expose all available tools for maximum agent capability Minimal tool set to reduce attack surface Scope tools to task domain; policy engine enforces at runtime; review tool set quarterly

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Prompt injection via tool result Medium Critical — agent executes attacker-controlled instructions Output validation on all tool results before re-injection into context; anomaly detection on plan changes Abort task; log injection attempt; alert security team; sanitise source tool
Infinite loop (agent cannot make progress) Medium High — resource exhaustion; SLA breach Step counter; stuck-detection (same tool + same args called 3× consecutively) Termination Controller enforces step limit; return partial result
Memory store poisoning Low High — corrupted memory degrades all future tasks using that memory Memory integrity checksums; outlier detection on retrieved memory quality scores Quarantine suspect memory entries; rebuild from audit log
Tool credential leak via LLM output Low Critical — credentials exposed in agent response to caller Output filtering for credential patterns (regex + entropy detection); no credential injection into context Immediate secret rotation; incident declaration; audit log review
Model hallucination of tool argument High Medium — tool call fails with validation error; loop recovers JSON Schema validation at dispatcher catches invalid arguments Structured retry with explicit error fed back to LLM; max 2 retries
Sandbox escape Very Low Critical — arbitrary code execution on host Seccomp profiles; eBPF syscall monitoring; network egress whitelisting Immediate container kill; host isolation; P0 incident
Cost runaway Medium High — unbudgeted cloud spend Cost Budget Monitor with per-task and per-day ceilings Hard kill switch when daily budget exceeded; alert FinOps

Cascading Failure Scenarios

  • Memory Store Corruption → Degraded Context → Hallucination Cascade: If poisoned memories are retrieved at scale, multiple concurrent agents receive corrupted context, leading to a correlated hallucination event. Mitigation: memory store is write-audited; corruption triggers circuit breaker that switches all agents to zero-memory mode.
  • LLM Provider Brownout → Retry Storm → Quota Exhaustion: Partial LLM failures cause agents to retry, exhausting quota for all agents simultaneously. Mitigation: circuit breaker on LLM provider client; exponential backoff with jitter; queue-based retry with rate limiting.

14. Regulatory Considerations

APRA CPS 230 (Operational Resilience)

  • The agent is classified as a material business service component if it supports critical operations; requires BIA, RTO/RPO, and annual testing
  • Third-party LLM providers are material third-party service providers; APRA notification and contract requirements apply
  • Audit log retention and integrity controls satisfy the operational records requirement

APRA CPS 234 (Information Security)

  • Agent workload identity and tool access controls implement the information asset protection requirement
  • Sandboxed execution and network egress controls satisfy the capability containment requirement
  • Incident detection and response controls (Section 10) map to CPS 234 notification obligations

Privacy Act 1988 (Australia) / GDPR (EU)

  • PII detection and masking before LLM ingestion is mandatory
  • Episodic memory storing PII must honour subject access requests and right to erasure; memory purge procedure required
  • Cross-border data transfer restrictions apply to cloud-hosted inference and memory stores; data residency controls are required

EU AI Act

  • If the agent supports a high-risk use case (Art. 6, Annex III — e.g., employment, credit, healthcare), full Art. 9–17 obligations apply: risk management system, data governance, technical documentation, human oversight, accuracy/robustness requirements, and transparency to affected persons
  • Art. 14 (Human Oversight): the human approval gate for irreversible actions directly implements this requirement; must be documented in the technical file
  • Art. 17 (Quality Management): model risk management artefacts and performance monitoring satisfy this requirement
  • General Purpose AI provisions (Art. 51–55) apply if the foundation model is provided to third parties

ISO 42001 (AI Management System)

  • This pattern's governance artefacts (Section 9) map directly to ISO 42001 §6.1 (risk assessment) and §8.4 (AI system lifecycle)
  • Agent purpose statement satisfies the intended use documentation requirement
  • Execution trace archive satisfies the operational records requirement (§9.1)

NIST AI RMF

  • GOVERN: Agent purpose statement + risk assessment maps to GOVERN 1.1, 1.2
  • MAP: Industry applicability table + failure mode analysis maps to MAP 1.5, 2.2
  • MEASURE: Model performance report + SLO monitoring maps to MEASURE 1.1, 2.5
  • MANAGE: Human approval gates + incident response maps to MANAGE 2.2, 4.1

15. Reference Implementations

AWS

Component AWS Service
LLM Inference Amazon Bedrock (Claude, Titan, Llama)
Agent Orchestration Amazon Bedrock Agents
Tool Execution Sandbox AWS Lambda (isolated function per tool) + Firecracker
Episodic / Semantic Memory Amazon OpenSearch Service (k-NN) or Aurora PostgreSQL + pgvector
Audit Log AWS CloudTrail + S3 with Object Lock (WORM)
Secrets AWS Secrets Manager
Policy Engine AWS Verified Access + custom Lambda authoriser
Observability AWS X-Ray + CloudWatch + Amazon Managed Grafana
Cost Monitor AWS Cost Explorer API + custom Lambda alerting

Azure

Component Azure Service
LLM Inference Azure OpenAI Service (GPT-4o, o-series)
Agent Orchestration Azure AI Foundry Agent Service + Semantic Kernel
Tool Execution Sandbox Azure Container Instances (per-invocation) + Azure Functions
Episodic / Semantic Memory Azure AI Search (vector) + Azure Cosmos DB (episodic)
Audit Log Azure Monitor Logs + Azure Immutable Blob Storage
Secrets Azure Key Vault
Policy Engine Azure Policy + custom APIM policy
Observability Azure Monitor + Application Insights + OpenTelemetry

GCP

Component GCP Service
LLM Inference Vertex AI (Gemini 1.5 Pro / Flash)
Agent Orchestration Vertex AI Agent Builder
Tool Execution Sandbox Cloud Run (per-request isolation) + gVisor
Episodic / Semantic Memory Vertex AI Vector Search + Cloud Spanner (episodic)
Audit Log Cloud Audit Logs + Cloud Storage (WORM bucket)
Secrets Secret Manager
Observability Cloud Trace + Cloud Monitoring + Managed Prometheus

On-Premises / Private Cloud

Component Technology
LLM Inference Ollama + Llama 3.1 70B / Mistral Large on GPU cluster; or vLLM serving
Agent Orchestration LangGraph or custom Python agent loop
Tool Execution Sandbox Kubernetes with gVisor runtime class + NetworkPolicy
Semantic Memory Weaviate or Qdrant on Kubernetes
Audit Log Apache Kafka (event log) + Apache Iceberg on MinIO
Secrets HashiCorp Vault
Observability OpenTelemetry Collector + Grafana + Loki + Tempo

Pattern ID Relationship Type Notes
Stateful Agent Memory EAAPL-AGT002 Extends This pattern references AGT002 for all memory tier implementations
Agent Tool Registry EAAPL-AGT003 Depends On Tool Dispatcher relies on the registry defined in AGT003
Agent Sandboxing EAAPL-AGT004 Depends On Sandboxed Executor implements the isolation defined in AGT004
Agent Checkpoint and Recovery EAAPL-AGT005 Integrates With State serialisation and recovery extends the loop defined here
Reflexive Agent EAAPL-AGT006 Extends Reflexive agent adds a self-critique step to the Reflect phase of this loop
Long-Running Agent EAAPL-AGT007 Extends Long-running agent externalises the loop state and adds async management
Event-Driven Agent EAAPL-AGT008 Extends Event-driven agent wraps this pattern with an event trigger and back-pressure layer
Agent Identity and Authorisation EAAPL-AGT009 Depends On Agent workload identity is a prerequisite for the Policy Engine
Agent Cost Governance EAAPL-AGT010 Integrates With Cost monitor is a component of the governance plane
Multi-Agent Orchestration EAAPL-MAG001 Composes Orchestrator deploys multiple instances of this pattern as worker agents
Supervisor Agent EAAPL-MAG002 Composes Worker agents are instances of this pattern
Human-in-the-Loop Agent EAAPL-MAG003 Extends Adds mandatory human approval queue to the Act phase of this pattern

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Adoption Breadth 5 Deployed by all major hyperscalers; basis for LangChain, Semantic Kernel, Bedrock Agents, Vertex AI Agent Builder
Tooling Ecosystem 5 Mature open-source frameworks (LangGraph, AutoGen, CrewAI); managed cloud services available
Operational Patterns 4 SRE runbooks established; observability tooling mature; DR patterns well understood
Security Hardening 4 OWASP LLM mitigations defined; sandbox technology mature; some novel attacks (indirect injection) still emerging
Regulatory Clarity 3 EU AI Act framework established; APRA guidance published; implementation guidance still evolving
Community Knowledge 5 Extensive published case studies, benchmarks, and failure post-mortems

18. Revision History

Version Date Author Changes
1.0 2024-03-01 Architecture Board Initial pattern publication
1.1 2024-06-15 AI Platform Team Added OWASP LLM Top 10 section; updated tool registry reference
2.0 2024-11-01 Architecture Board Major revision: added four-tier memory model; restructured data flow; added EU AI Act Art. 14 detail
2.1 2025-02-10 AI Risk Team Added APRA CPS230 mapping; updated cost table; added cascading failure scenarios
← Back to LibraryMore Agentic AI