EAAPLEnterprise AI Architecture Pattern Library
EAAPLLibraryAgentic Workflows
Mature
⇄ Compare

ReAct Agent Loop

📄 Agentic WorkflowsEU AI ActISO/IEC 42001

[EAAPL-WRK001] ReAct Agent Loop

Category: Agentic Workflows Sub-category: Iterative Reasoning Architecture Version: 1.0 Maturity: Proven Tags: react, reason-act-observe, iterative-reasoning, thought-action-observation, scratchpad, tool-calling Regulatory Relevance: EU AI Act (Art. 13), ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1)


1. Executive Summary

The ReAct (Reason + Act + Observe) Agent Loop Pattern defines an iterative execution architecture in which an AI agent alternates between explicit reasoning steps (Thought), tool-use actions (Action), and environmental observations (Observation) until it reaches a final answer. First formalised in the ReAct paper (Yao et al., 2022), this pattern has become the foundational execution model for production agentic systems deployed across document understanding, regulatory Q&A, and enterprise knowledge retrieval.

For CIO/CTO audiences: ReAct is the thinking-out-loud loop that allows an AI to work through a complex problem step by step, using external tools (databases, APIs, search) at each step to ground its reasoning in real data. Unlike a single inference call, ReAct is transparent — every reasoning step and every tool call is recorded in a scratchpad that can be audited. The pattern directly addresses the explainability requirements of regulated industries: the chain of thought is the audit trail. The primary operational considerations are latency (each loop iteration adds an LLM inference call) and cost control (unbounded loops exhaust token budgets).


2. Problem Statement

Business Problem

Complex business queries — "What are our regulatory obligations under CPS 234 for this vendor?" or "What caused this transaction anomaly?" — cannot be answered by a single LLM inference call against a static knowledge base. The answer requires multiple information-gathering steps, intermediate reasoning, and dynamic tool use based on what each step reveals.

Technical Problem

A single-turn LLM inference cannot dynamically decide which tools to use in what order based on intermediate results. The model must commit to its answer in one pass, making it prone to hallucination when the required information is not in its context and unable to adapt its information-gathering strategy based on what it discovers.

Symptoms of Absence

  • Agent gives confident but hallucinated answers to multi-step queries because it cannot retrieve intermediate grounding facts
  • Tool calls are all made upfront in a fixed order regardless of what earlier results reveal
  • No intermediate reasoning is recorded; outputs are unexplainable and unauditable
  • Agent cannot recover gracefully when a tool call fails midway through a task

Cost of Inaction

  • Accuracy Risk: Multi-step queries without iterative reasoning produce unreliable outputs unsuitable for regulated decision support
  • Auditability: Absence of reasoning trace creates compliance gaps in regulated industries
  • Trust: Users reject agent outputs they cannot verify or understand

3. Context

When to Apply

  • Tasks require multiple sequential information-gathering steps where each step informs the next
  • The full set of required tool calls cannot be determined upfront without seeing intermediate results
  • Auditability of reasoning is a regulatory or operational requirement
  • Tool failures must be handled gracefully mid-task with alternative strategies

When NOT to Apply

  • Single-step tasks fully answerable from context without tool use
  • Hard real-time latency constraints incompatible with multi-iteration inference (< 500ms)
  • Tasks with a fully deterministic, pre-known execution path (use Sequential Chain, EAAPL-WRK002)
  • Tasks where the scratchpad reasoning would contain sensitive intermediate data that must not be persisted

Prerequisites

  • EAAPL-AGT001 (Single Agent Pattern) baseline
  • EAAPL-AGT003 (Agent Tool Registry) for tool discovery and invocation
  • Iteration limit and cost ceiling configuration
  • Structured scratchpad format agreed and documented

Industry Applicability

Industry Example Use Case Why ReAct Fits
Financial Services AML investigation: retrieve transaction → analyse → retrieve counterparty → analyse chain Multi-step, each step informs next
Legal / Regulatory CPS 234 obligation mapping: read clause → search precedent → cross-reference guidance Reasoning must be traceable
Government Grant eligibility assessment: check criteria → retrieve applicant data → reason eligibility Multi-source, auditable reasoning required
Healthcare Clinical pathway query: check diagnosis → retrieve guidelines → identify contraindications Safety-critical; reasoning trace required
Resources Site compliance check: retrieve permit conditions → check measurement data → identify breaches Multi-source; auditable

4. Architecture Overview

The ReAct Loop extends the base agent (EAAPL-AGT001) by making the Thought-Action-Observation cycle the primary execution model. The scratchpad is the central artefact: a running record of every thought, action, and observation in the current task execution.

Thought Phase The LLM generates a reasoning step grounded in the current scratchpad context. The thought is free-form text that reasons about what is known, what is missing, and what action should be taken next. The thought is appended to the scratchpad. This explicit externalisation of reasoning is what makes ReAct more reliable than chain-of-thought alone — the reasoning is grounded by subsequent observations rather than generating all steps in one pass.

Action Phase Based on the thought, the LLM generates a structured action: a tool name and parameters extracted from the reasoning. The action format is strict and validated against the Tool Registry (EAAPL-AGT003) before execution. Invalid action formats trigger a thought-level correction cycle rather than a hard error.

Observation Phase The tool is executed and its result is injected into the scratchpad as an Observation. The observation is truncated to fit within the context window if necessary, with a summary flag indicating truncation. The next thought phase begins from the updated scratchpad.

Termination The loop terminates when: (a) the LLM produces a Final Answer action, (b) the maximum iteration limit is reached, or (c) the cost ceiling is hit. In cases (b) and (c), the best partial answer from the scratchpad is returned with a metadata flag indicating incomplete execution.

Scratchpad Management For long-running tasks, the scratchpad can exceed the context window. Context compression (EAAPL-WRK007) is applied: earlier thought-observation pairs are summarised and the summary replaces the raw text, preserving the reasoning trace while reclaiming context budget.


5. Architecture Diagram

ARCHITECTURE DIAGRAM
flowchart TD subgraph Input["Task Input"] A[User Query] end subgraph Loop["ReAct Iteration Loop"] B[Reason] C{Action Type?} D[Tool Call] E[Observe] F[Iteration Guard] end subgraph Termination["Termination"] G[Final Answer] H[Partial Answer] end subgraph Artefacts["Artefacts"] I[(Scratchpad Store)] J[(Audit Log)] end A --> B B --> C C -->|Tool Use| D C -->|Final Answer| G D --> E E --> F F -->|continue| B F -->|limit hit| H B --> I D --> I E --> I G --> J H --> J

6. Components

Component Type Responsibility Technology Options Criticality
Thought Generator AI Component Produces reasoning step from current scratchpad context GPT-4o, Claude 3.5, Gemini 1.5 Pro via inference API Critical
Action Parser Logic Component Parses structured action (tool name + params) from thought output Regex; JSON schema validation; structured output mode Critical
Tool Registry Interface Integration Dispatches validated action to registered tool; returns result EAAPL-AGT003 implementation Critical
Observation Injector State Manager Appends tool result to scratchpad; manages context budget Custom; LangChain scratchpad; LlamaIndex memory buffer High
Iteration Guard Safety Enforces max iteration limit and cost ceiling Counter in loop state; configurable limits Critical
Scratchpad Store State Persists full thought-action-observation chain for audit Redis; PostgreSQL; S3 (async) High
Context Compressor Optimisation Summarises early scratchpad entries when context budget is low EAAPL-WRK007 implementation Medium
Final Answer Extractor Logic Extracts and validates the final answer from the terminal thought Structured output parser Critical

7. Data Flow

Step Actor Action Output
1 Caller Submits query with task context and tool permissions Task object: {query, context, allowed_tools, max_iterations, cost_ceiling}
2 Thought Generator Generates Thought 1 from query context Thought: I need to retrieve the current CPS 234 obligations for cloud services.
3 Action Parser Parses tool call from thought Action: search_regulatory_db(query="CPS 234 cloud obligations")
4 Tool Registry Executes search tool; returns results Observation: [3 relevant clauses returned]
5 Observation Injector Appends observation to scratchpad Updated scratchpad with T1-A1-O1
6 Iteration Guard Iteration 1 of 10; cost $0.02 of $0.50 ceiling — continue Continue
7 Thought Generator Generates Thought 2 from enriched scratchpad Thought: Clause 3.4 references material service providers. I need to check if this vendor is classified as MSP.
8 Action Parser Parses next tool call Action: lookup_vendor_classification(vendor_id="VND-0421")
9 Tool Registry Executes vendor lookup Observation: Vendor VND-0421 classified as MSP — CPS 234 §3.4 applies
10 Thought Generator Generates Final Answer thought Final Answer: Under CPS 234, this vendor is a Material Service Provider. The following obligations apply: [...]
11 Scratchpad Store Full trace persisted Audit record with 2 iterations, 2 tool calls, final answer

Error Flow

Error Detection Recovery
Tool call returns error Observation contains error payload Next thought reasons about the error and selects alternative tool or rephrases
Action parse failure (malformed tool syntax) Action Parser validation error Inject correction observation: "Action format invalid. Use: tool_name(param=value)"
Max iterations reached Iteration Guard Return best partial answer from scratchpad with status: max_iterations_reached
Cost ceiling hit Cost Monitor Immediate termination; return partial answer with status: cost_ceiling_hit

8. Security Considerations

Prompt Injection via Tool Observations

Tool observations are injected directly into the LLM context. Malicious content in tool results (e.g., a document containing "Ignore all previous instructions") can redirect agent behaviour.

  • Mitigation: Wrap all observations in explicit XML delimiters <observation>...</observation>; validate that the next thought does not deviate from the original task goal; implement thought-level anomaly detection.

OWASP LLM Top 10

OWASP LLM Risk ReAct Applicability Mitigation
LLM01 Prompt Injection Tool observations are attacker-controlled content injected into context Observation delimiters; goal-consistency validation on each thought
LLM08 Excessive Agency Iterative loop can take a long sequence of actions with cumulative side effects Iteration limit; write-action approval gate (EAAPL-HITL001); cost ceiling
LLM04 Model DoS Unbounded loops exhaust inference budget Hard iteration limit; cost ceiling enforced before each iteration
LLM07 Insecure Plugin Design Tool calls execute with agent's permission scope Tool registry permission model per EAAPL-AGT003; read-only tools by default

9. Governance Considerations

Scratchpad as Audit Trail

  • The full thought-action-observation scratchpad is the explainability artefact for regulated use cases
  • Scratchpads must be retained per the organisation's AI audit log retention policy
  • For EU AI Act Art. 13 (Transparency) compliance, the scratchpad must be available to human reviewers on request

Governance Artefacts

Artefact Owner Frequency Purpose
ReAct Scratchpad Archive AI Platform Per task; retained 7 years for regulated use Explainability and audit evidence
Iteration Limit Policy AI Governance Board Quarterly review Documents approved max iterations per task class
Tool Permission Matrix Security + AI Platform On tool addition Documents which tools are available to which agents
Cost Ceiling Register FinOps Quarterly Per-task-class cost ceilings and utilisation

10. Operational Considerations

SLOs

SLO Target Window Alert
Task completion rate (Final Answer reached) ≥ 95% 24-hour rolling < 90% triggers P2; check iteration limits
Average iterations per completed task ≤ 5 24-hour rolling > 8 avg triggers P3; review task complexity or tool quality
p95 task latency (end-to-end) ≤ 30s 1-hour rolling > 60s triggers P2
Tool call error rate ≤ 2% 1-hour rolling > 5% triggers P2; check tool registry health

Monitoring

  • Iteration count distribution per task type: long tail indicates tool reliability issues or ambiguous task framing
  • Final answer confidence (where model-reported): trending downward indicates context quality degradation
  • Scratchpad token length: p95 trending upward indicates context compression is needed

11. Cost Considerations

Scenario Iterations Approx. Cost per Task (GPT-4o) Notes
Simple lookup task 1–2 $0.01–0.05 Single tool call; straightforward answer
Moderate multi-step task 3–5 $0.05–0.20 Typical regulatory Q&A
Complex research task 6–10 $0.20–0.80 Multi-source synthesis
Max iterations (defensive) 10–15 $0.80–2.00 Cost ceiling should trigger before this

Optimisations

  • Use smaller model for thought generation and reserve larger model only for synthesis steps
  • Cache tool results within session to avoid duplicate tool calls for the same parameters
  • Set per-task-class iteration limits based on observed average; adjust monthly

12. Trade-Off Analysis

Option Reasoning Quality Auditability Latency Cost Best For
A: ReAct with full scratchpad (Recommended) High Very High Medium Medium Regulated, multi-step tasks
B: Single-pass with all tools upfront Medium Low Low Low Simple, predictable tasks
C: Plan-and-Execute (EAAPL-WRK005) Very High High High High Tasks with known, parallelisable subtasks
D: ReAct with context compression High High Medium Low–Medium Long-running tasks; context-constrained

Architectural Tensions

Tension Left Pole Right Pole Balance
Iteration depth vs. Latency Many iterations for thorough reasoning Few iterations for fast response Risk-tier: async for thorough; 3-iteration cap for interactive
Transparency vs. Privacy Full scratchpad retained Scratchpad discarded post-task Retain for regulated tasks; discard PII-containing scratchpads per policy
Tool autonomy vs. Human oversight Agent calls any tool in registry Every tool call requires approval Write-action approval gate; read tools are autonomous

13. Failure Modes

Failure Mode Likelihood Impact Detection Recovery
Reasoning loop (agent repeats same thought-action) Medium High — infinite cost without guard Iteration Guard; thought-similarity detection Detect repeated action signatures; inject loop-break observation
Tool observation ignored (agent doesn't update reasoning) Medium Medium — stale reasoning Thought references observation content check Prompt engineering: require explicit observation acknowledgement in thought
Premature Final Answer (insufficient reasoning) Medium High — inaccurate output Quality gate on Final Answer; confidence check Confidence threshold before accepting Final Answer
Context overflow (scratchpad exceeds window) Low–Medium High — context truncation corrupts reasoning Token counter before each thought Apply EAAPL-WRK007 context compression; alert when > 70% context used
Tool result poisoning (injected instructions) Low High — agent hijacking Thought-anomaly detection Observation delimiters; goal-drift detection

14. Regulatory Considerations

EU AI Act

  • Art. 13 (Transparency): The scratchpad constitutes the reasoning trace required for high-risk AI system transparency. Must be retained and accessible to competent authorities.
  • Art. 14 (Human Oversight): Iteration limits and cost ceilings are technical measures implementing the requirement for human ability to intervene and override automated decision-making.

ISO 42001

  • §8.4: The thought-action-observation loop with iteration limits implements the operational controls required for AI system quality management.

NIST AI RMF

  • GOVERN 1.1: Documented iteration limits and tool permission matrices constitute the governance policies required for responsible AI deployment.

Australian Context

  • APRA CPS 230: Scratchpad retention supports the operational resilience evidence requirements for AI systems used in material business processes.
  • AG's AI Ethics Principles: Transparency (scratchpad audit trail) and Accountability (iteration limits, human oversight gate) are directly addressed.

15. Reference Implementations

AWS

Component Service
Thought + Action Generation Amazon Bedrock (Claude 3.5 Sonnet) with structured output
Tool Registry AWS Lambda functions registered via Bedrock Tool Use API
Scratchpad Store Amazon DynamoDB (per-session); S3 for archival
Iteration Guard AWS Step Functions state machine with MaxAttempts
Observability Amazon CloudWatch + AWS X-Ray for per-iteration tracing

Azure

Component Service
Thought + Action Generation Azure OpenAI Service (GPT-4o) with function calling
Tool Execution Azure Functions triggered by Action Parser
Scratchpad Store Azure Cosmos DB (per-session)
Orchestration Azure Durable Functions (entity functions for loop state)
Observability Azure Monitor + Application Insights

On-Premises

Component Technology
Thought + Action Generation vLLM serving Llama 3.1 70B Instruct
ReAct Orchestration LangGraph create_react_agent; or custom loop
Scratchpad Store PostgreSQL with JSONB scratchpad column
Tool Registry EAAPL-AGT003 on-prem implementation

Pattern ID Relationship Type Notes
Single Agent Pattern EAAPL-AGT001 Base Pattern ReAct extends the base agent loop with explicit thought-action-observation structure
Agent Tool Registry EAAPL-AGT003 Depends On All tool calls in the Action phase are dispatched via the Tool Registry
Tool Call Orchestration EAAPL-WRK006 Peer WRK006 covers detailed tool-call execution mechanics within the action phase
Context Compression EAAPL-WRK007 Integrates With Applied when scratchpad approaches context window limit
Plan-and-Execute EAAPL-WRK005 Alternative Upfront planning with separate execution; prefer when task subtasks are known in advance
Human Escalation EAAPL-HITL001 Integrates With Write-action approval gate before destructive tool calls

17. Maturity Assessment

Overall Maturity: Proven

Dimension Score (1–5) Evidence
Research Foundation 5 ReAct paper (Yao et al., 2022) widely replicated; standard in academic literature
Production Deployment 4 Deployed at scale in enterprise chatbots, code assistants, research tools
Framework Support 5 LangChain, LlamaIndex, LangGraph all implement ReAct as first-class pattern
Tooling Maturity 4 Observability and scratchpad tooling maturing; standard iteration/cost controls established
Cost Optimisation 3 Model routing and caching patterns established but not yet universally standardised

18. Revision History

Version Date Author Changes
1.0 2025-06-13 Architecture Board Initial publication in Agentic Workflows category
← Back to LibraryMore Agentic Workflows