Proven

ReAct Agent Loop

Agentic WorkflowsEU AI ActISO/IEC 42001

[EAAPL-WRK001] ReAct Agent Loop

Category: Agentic Workflows Sub-category: Iterative Reasoning Architecture Version: 1.0 Maturity: Proven Tags: react, reason-act-observe, iterative-reasoning, thought-action-observation, scratchpad, tool-calling Regulatory Relevance: EU AI Act (Art. 13), ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1)

1. Executive Summary

The ReAct (Reason + Act + Observe) Agent Loop Pattern defines an iterative execution architecture in which an AI agent alternates between explicit reasoning steps (Thought), tool-use actions (Action), and environmental observations (Observation) until it reaches a final answer. First formalised in the ReAct paper (Yao et al., 2022), this pattern has become the foundational execution model for production agentic systems deployed across document understanding, regulatory Q&A, and enterprise knowledge retrieval.

For CIO/CTO audiences: ReAct is the thinking-out-loud loop that allows an AI to work through a complex problem step by step, using external tools (databases, APIs, search) at each step to ground its reasoning in real data. Unlike a single inference call, ReAct is transparent — every reasoning step and every tool call is recorded in a scratchpad that can be audited. The pattern directly addresses the explainability requirements of regulated industries: the chain of thought is the audit trail. The primary operational considerations are latency (each loop iteration adds an LLM inference call) and cost control (unbounded loops exhaust token budgets).

2. Problem Statement

Business Problem

Complex business queries — "What are our regulatory obligations under CPS 234 for this vendor?" or "What caused this transaction anomaly?" — cannot be answered by a single LLM inference call against a static knowledge base. The answer requires multiple information-gathering steps, intermediate reasoning, and dynamic tool use based on what each step reveals.

Technical Problem

A single-turn LLM inference cannot dynamically decide which tools to use in what order based on intermediate results. The model must commit to its answer in one pass, making it prone to hallucination when the required information is not in its context and unable to adapt its information-gathering strategy based on what it discovers.

Symptoms of Absence

Agent gives confident but hallucinated answers to multi-step queries because it cannot retrieve intermediate grounding facts
Tool calls are all made upfront in a fixed order regardless of what earlier results reveal
No intermediate reasoning is recorded; outputs are unexplainable and unauditable
Agent cannot recover gracefully when a tool call fails midway through a task

Cost of Inaction

Accuracy Risk: Multi-step queries without iterative reasoning produce unreliable outputs unsuitable for regulated decision support
Auditability: Absence of reasoning trace creates compliance gaps in regulated industries
Trust: Users reject agent outputs they cannot verify or understand

3. Context

When to Apply

Tasks require multiple sequential information-gathering steps where each step informs the next
The full set of required tool calls cannot be determined upfront without seeing intermediate results
Auditability of reasoning is a regulatory or operational requirement
Tool failures must be handled gracefully mid-task with alternative strategies

When NOT to Apply

Single-step tasks fully answerable from context without tool use
Hard real-time latency constraints incompatible with multi-iteration inference (< 500ms)
Tasks with a fully deterministic, pre-known execution path (use Sequential Chain, EAAPL-WRK002)
Tasks where the scratchpad reasoning would contain sensitive intermediate data that must not be persisted

Prerequisites

EAAPL-AGT001 (Single Agent Pattern) baseline
EAAPL-AGT003 (Agent Tool Registry) for tool discovery and invocation
Iteration limit and cost ceiling configuration
Structured scratchpad format agreed and documented

Industry Applicability

Industry	Example Use Case	Why ReAct Fits
Financial Services	AML investigation: retrieve transaction → analyse → retrieve counterparty → analyse chain	Multi-step, each step informs next
Legal / Regulatory	CPS 234 obligation mapping: read clause → search precedent → cross-reference guidance	Reasoning must be traceable
Government	Grant eligibility assessment: check criteria → retrieve applicant data → reason eligibility	Multi-source, auditable reasoning required
Healthcare	Clinical pathway query: check diagnosis → retrieve guidelines → identify contraindications	Safety-critical; reasoning trace required
Resources	Site compliance check: retrieve permit conditions → check measurement data → identify breaches	Multi-source; auditable

4. Architecture Overview

The ReAct Loop extends the base agent (EAAPL-AGT001) by making the Thought-Action-Observation cycle the primary execution model. The scratchpad is the central artefact: a running record of every thought, action, and observation in the current task execution.

Thought Phase The LLM generates a reasoning step grounded in the current scratchpad context. The thought is free-form text that reasons about what is known, what is missing, and what action should be taken next. The thought is appended to the scratchpad. This explicit externalisation of reasoning is what makes ReAct more reliable than chain-of-thought alone — the reasoning is grounded by subsequent observations rather than generating all steps in one pass.

Action Phase Based on the thought, the LLM generates a structured action: a tool name and parameters extracted from the reasoning. The action format is strict and validated against the Tool Registry (EAAPL-AGT003) before execution. Invalid action formats trigger a thought-level correction cycle rather than a hard error.

Observation Phase The tool is executed and its result is injected into the scratchpad as an Observation. The observation is truncated to fit within the context window if necessary, with a summary flag indicating truncation. The next thought phase begins from the updated scratchpad.

Termination The loop terminates when: (a) the LLM produces a Final Answer action, (b) the maximum iteration limit is reached, or (c) the cost ceiling is hit. In cases (b) and (c), the best partial answer from the scratchpad is returned with a metadata flag indicating incomplete execution.

Scratchpad Management For long-running tasks, the scratchpad can exceed the context window. Context compression (EAAPL-WRK007) is applied: earlier thought-observation pairs are summarised and the summary replaces the raw text, preserving the reasoning trace while reclaiming context budget.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Input["Task Input"] A[User Query] end subgraph Loop["ReAct Iteration Loop"] B[Reason] C{Action Type?} D[Tool Call] E[Observe] F[Iteration Guard] end subgraph Termination["Termination"] G[Final Answer] H[Partial Answer] end subgraph Artefacts["Artefacts"] I[(Scratchpad Store)] J[(Audit Log)] end A --> B B --> C C -->|Tool Use| D C -->|Final Answer| G D --> E E --> F F -->|continue| B F -->|limit hit| H B --> I D --> I E --> I G --> J H --> J

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Thought Generator	AI Component	Produces reasoning step from current scratchpad context	GPT-4o, Claude 3.5, Gemini 1.5 Pro via inference API	Critical
Action Parser	Logic Component	Parses structured action (tool name + params) from thought output	Regex; JSON schema validation; structured output mode	Critical
Tool Registry Interface	Integration	Dispatches validated action to registered tool; returns result	EAAPL-AGT003 implementation	Critical
Observation Injector	State Manager	Appends tool result to scratchpad; manages context budget	Custom; LangChain scratchpad; LlamaIndex memory buffer	High
Iteration Guard	Safety	Enforces max iteration limit and cost ceiling	Counter in loop state; configurable limits	Critical
Scratchpad Store	State	Persists full thought-action-observation chain for audit	Redis; PostgreSQL; S3 (async)	High
Context Compressor	Optimisation	Summarises early scratchpad entries when context budget is low	EAAPL-WRK007 implementation	Medium
Final Answer Extractor	Logic	Extracts and validates the final answer from the terminal thought	Structured output parser	Critical

7. Data Flow

Step	Actor	Action	Output
1	Caller	Submits query with task context and tool permissions	Task object: `{query, context, allowed_tools, max_iterations, cost_ceiling}`
2	Thought Generator	Generates Thought 1 from query context	`Thought: I need to retrieve the current CPS 234 obligations for cloud services.`
3	Action Parser	Parses tool call from thought	`Action: search_regulatory_db(query="CPS 234 cloud obligations")`
4	Tool Registry	Executes search tool; returns results	`Observation: [3 relevant clauses returned]`
5	Observation Injector	Appends observation to scratchpad	Updated scratchpad with T1-A1-O1
6	Iteration Guard	Iteration 1 of 10; cost $0.02 of $0.50 ceiling — continue	Continue
7	Thought Generator	Generates Thought 2 from enriched scratchpad	`Thought: Clause 3.4 references material service providers. I need to check if this vendor is classified as MSP.`
8	Action Parser	Parses next tool call	`Action: lookup_vendor_classification(vendor_id="VND-0421")`
9	Tool Registry	Executes vendor lookup	`Observation: Vendor VND-0421 classified as MSP — CPS 234 §3.4 applies`
10	Thought Generator	Generates Final Answer thought	`Final Answer: Under CPS 234, this vendor is a Material Service Provider. The following obligations apply: [...]`
11	Scratchpad Store	Full trace persisted	Audit record with 2 iterations, 2 tool calls, final answer

Error Flow

Error	Detection	Recovery
Tool call returns error	Observation contains error payload	Next thought reasons about the error and selects alternative tool or rephrases
Action parse failure (malformed tool syntax)	Action Parser validation error	Inject correction observation: "Action format invalid. Use: tool_name(param=value)"
Max iterations reached	Iteration Guard	Return best partial answer from scratchpad with `status: max_iterations_reached`
Cost ceiling hit	Cost Monitor	Immediate termination; return partial answer with `status: cost_ceiling_hit`

8. Security Considerations

Prompt Injection via Tool Observations

Tool observations are injected directly into the LLM context. Malicious content in tool results (e.g., a document containing "Ignore all previous instructions") can redirect agent behaviour.

Mitigation: Wrap all observations in explicit XML delimiters <observation>...</observation>; validate that the next thought does not deviate from the original task goal; implement thought-level anomaly detection.

OWASP LLM Top 10

OWASP LLM Risk	ReAct Applicability	Mitigation
LLM01 Prompt Injection	Tool observations are attacker-controlled content injected into context	Observation delimiters; goal-consistency validation on each thought
LLM08 Excessive Agency	Iterative loop can take a long sequence of actions with cumulative side effects	Iteration limit; write-action approval gate (EAAPL-HITL001); cost ceiling
LLM04 Model DoS	Unbounded loops exhaust inference budget	Hard iteration limit; cost ceiling enforced before each iteration
LLM07 Insecure Plugin Design	Tool calls execute with agent's permission scope	Tool registry permission model per EAAPL-AGT003; read-only tools by default

9. Governance Considerations

Scratchpad as Audit Trail

The full thought-action-observation scratchpad is the explainability artefact for regulated use cases
Scratchpads must be retained per the organisation's AI audit log retention policy
For EU AI Act Art. 13 (Transparency) compliance, the scratchpad must be available to human reviewers on request

Governance Artefacts

Artefact	Owner	Frequency	Purpose
ReAct Scratchpad Archive	AI Platform	Per task; retained 7 years for regulated use	Explainability and audit evidence
Iteration Limit Policy	AI Governance Board	Quarterly review	Documents approved max iterations per task class
Tool Permission Matrix	Security + AI Platform	On tool addition	Documents which tools are available to which agents
Cost Ceiling Register	FinOps	Quarterly	Per-task-class cost ceilings and utilisation

10. Operational Considerations

SLOs

SLO	Target	Window	Alert
Task completion rate (Final Answer reached)	≥ 95%	24-hour rolling	< 90% triggers P2; check iteration limits
Average iterations per completed task	≤ 5	24-hour rolling	> 8 avg triggers P3; review task complexity or tool quality
p95 task latency (end-to-end)	≤ 30s	1-hour rolling	> 60s triggers P2
Tool call error rate	≤ 2%	1-hour rolling	> 5% triggers P2; check tool registry health

Monitoring

Iteration count distribution per task type: long tail indicates tool reliability issues or ambiguous task framing
Final answer confidence (where model-reported): trending downward indicates context quality degradation
Scratchpad token length: p95 trending upward indicates context compression is needed

11. Cost Considerations

Scenario	Iterations	Approx. Cost per Task (GPT-4o)	Notes
Simple lookup task	1–2	$0.01–0.05	Single tool call; straightforward answer
Moderate multi-step task	3–5	$0.05–0.20	Typical regulatory Q&A
Complex research task	6–10	$0.20–0.80	Multi-source synthesis
Max iterations (defensive)	10–15	$0.80–2.00	Cost ceiling should trigger before this

Optimisations

Use smaller model for thought generation and reserve larger model only for synthesis steps
Cache tool results within session to avoid duplicate tool calls for the same parameters
Set per-task-class iteration limits based on observed average; adjust monthly

12. Trade-Off Analysis

Option	Reasoning Quality	Auditability	Latency	Cost	Best For
A: ReAct with full scratchpad (Recommended)	High	Very High	Medium	Medium	Regulated, multi-step tasks
B: Single-pass with all tools upfront	Medium	Low	Low	Low	Simple, predictable tasks
C: Plan-and-Execute (EAAPL-WRK005)	Very High	High	High	High	Tasks with known, parallelisable subtasks
D: ReAct with context compression	High	High	Medium	Low–Medium	Long-running tasks; context-constrained

Architectural Tensions

Tension	Left Pole	Right Pole	Balance
Iteration depth vs. Latency	Many iterations for thorough reasoning	Few iterations for fast response	Risk-tier: async for thorough; 3-iteration cap for interactive
Transparency vs. Privacy	Full scratchpad retained	Scratchpad discarded post-task	Retain for regulated tasks; discard PII-containing scratchpads per policy
Tool autonomy vs. Human oversight	Agent calls any tool in registry	Every tool call requires approval	Write-action approval gate; read tools are autonomous

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Reasoning loop (agent repeats same thought-action)	Medium	High — infinite cost without guard	Iteration Guard; thought-similarity detection	Detect repeated action signatures; inject loop-break observation
Tool observation ignored (agent doesn't update reasoning)	Medium	Medium — stale reasoning	Thought references observation content check	Prompt engineering: require explicit observation acknowledgement in thought
Premature Final Answer (insufficient reasoning)	Medium	High — inaccurate output	Quality gate on Final Answer; confidence check	Confidence threshold before accepting Final Answer
Context overflow (scratchpad exceeds window)	Low–Medium	High — context truncation corrupts reasoning	Token counter before each thought	Apply EAAPL-WRK007 context compression; alert when > 70% context used
Tool result poisoning (injected instructions)	Low	High — agent hijacking	Thought-anomaly detection	Observation delimiters; goal-drift detection

14. Regulatory Considerations

EU AI Act

Art. 13 (Transparency): The scratchpad constitutes the reasoning trace required for high-risk AI system transparency. Must be retained and accessible to competent authorities.
Art. 14 (Human Oversight): Iteration limits and cost ceilings are technical measures implementing the requirement for human ability to intervene and override automated decision-making.

ISO 42001

§8.4: The thought-action-observation loop with iteration limits implements the operational controls required for AI system quality management.

NIST AI RMF

GOVERN 1.1: Documented iteration limits and tool permission matrices constitute the governance policies required for responsible AI deployment.

Australian Context

APRA CPS 230: Scratchpad retention supports the operational resilience evidence requirements for AI systems used in material business processes.
AG's AI Ethics Principles: Transparency (scratchpad audit trail) and Accountability (iteration limits, human oversight gate) are directly addressed.

15. Reference Implementations

AWS

Component	Service
Thought + Action Generation	Amazon Bedrock (Claude 3.5 Sonnet) with structured output
Tool Registry	AWS Lambda functions registered via Bedrock Tool Use API
Scratchpad Store	Amazon DynamoDB (per-session); S3 for archival
Iteration Guard	AWS Step Functions state machine with MaxAttempts
Observability	Amazon CloudWatch + AWS X-Ray for per-iteration tracing

Azure

Component	Service
Thought + Action Generation	Azure OpenAI Service (GPT-4o) with function calling
Tool Execution	Azure Functions triggered by Action Parser
Scratchpad Store	Azure Cosmos DB (per-session)
Orchestration	Azure Durable Functions (entity functions for loop state)
Observability	Azure Monitor + Application Insights

On-Premises

Component	Technology
Thought + Action Generation	vLLM serving Llama 3.1 70B Instruct
ReAct Orchestration	LangGraph `create_react_agent`; or custom loop
Scratchpad Store	PostgreSQL with JSONB scratchpad column
Tool Registry	EAAPL-AGT003 on-prem implementation

Pattern	ID	Relationship Type	Notes
Single Agent Pattern	EAAPL-AGT001	Base Pattern	ReAct extends the base agent loop with explicit thought-action-observation structure
Agent Tool Registry	EAAPL-AGT003	Depends On	All tool calls in the Action phase are dispatched via the Tool Registry
Tool Call Orchestration	EAAPL-WRK006	Peer	WRK006 covers detailed tool-call execution mechanics within the action phase
Context Compression	EAAPL-WRK007	Integrates With	Applied when scratchpad approaches context window limit
Plan-and-Execute	EAAPL-WRK005	Alternative	Upfront planning with separate execution; prefer when task subtasks are known in advance
Human Escalation	EAAPL-HITL001	Integrates With	Write-action approval gate before destructive tool calls

17. Maturity Assessment

Overall Maturity: Proven

Dimension	Score (1–5)	Evidence
Research Foundation	5	ReAct paper (Yao et al., 2022) widely replicated; standard in academic literature
Production Deployment	4	Deployed at scale in enterprise chatbots, code assistants, research tools
Framework Support	5	LangChain, LlamaIndex, LangGraph all implement ReAct as first-class pattern
Tooling Maturity	4	Observability and scratchpad tooling maturing; standard iteration/cost controls established
Cost Optimisation	3	Model routing and caching patterns established but not yet universally standardised

18. Revision History

Version	Date	Author	Changes
1.0	2025-06-13	Architecture Board	Initial publication in Agentic Workflows category

Track this pattern for APRA/ASIC review

← Back to Library More Agentic Workflows →