[EAAPL-WRK001] ReAct Agent Loop
Category: Agentic Workflows
Sub-category: Iterative Reasoning Architecture
Version: 1.0
Maturity: Proven
Tags: react, reason-act-observe, iterative-reasoning, thought-action-observation, scratchpad, tool-calling
Regulatory Relevance: EU AI Act (Art. 13), ISO 42001 §8.4, NIST AI RMF (GOVERN 1.1)
1. Executive Summary
The ReAct (Reason + Act + Observe) Agent Loop Pattern defines an iterative execution architecture in which an AI agent alternates between explicit reasoning steps (Thought), tool-use actions (Action), and environmental observations (Observation) until it reaches a final answer. First formalised in the ReAct paper (Yao et al., 2022), this pattern has become the foundational execution model for production agentic systems deployed across document understanding, regulatory Q&A, and enterprise knowledge retrieval.
For CIO/CTO audiences: ReAct is the thinking-out-loud loop that allows an AI to work through a complex problem step by step, using external tools (databases, APIs, search) at each step to ground its reasoning in real data. Unlike a single inference call, ReAct is transparent — every reasoning step and every tool call is recorded in a scratchpad that can be audited. The pattern directly addresses the explainability requirements of regulated industries: the chain of thought is the audit trail. The primary operational considerations are latency (each loop iteration adds an LLM inference call) and cost control (unbounded loops exhaust token budgets).
2. Problem Statement
Business Problem
Complex business queries — "What are our regulatory obligations under CPS 234 for this vendor?" or "What caused this transaction anomaly?" — cannot be answered by a single LLM inference call against a static knowledge base. The answer requires multiple information-gathering steps, intermediate reasoning, and dynamic tool use based on what each step reveals.
Technical Problem
A single-turn LLM inference cannot dynamically decide which tools to use in what order based on intermediate results. The model must commit to its answer in one pass, making it prone to hallucination when the required information is not in its context and unable to adapt its information-gathering strategy based on what it discovers.
Symptoms of Absence
- Agent gives confident but hallucinated answers to multi-step queries because it cannot retrieve intermediate grounding facts
- Tool calls are all made upfront in a fixed order regardless of what earlier results reveal
- No intermediate reasoning is recorded; outputs are unexplainable and unauditable
- Agent cannot recover gracefully when a tool call fails midway through a task
Cost of Inaction
- Accuracy Risk: Multi-step queries without iterative reasoning produce unreliable outputs unsuitable for regulated decision support
- Auditability: Absence of reasoning trace creates compliance gaps in regulated industries
- Trust: Users reject agent outputs they cannot verify or understand
3. Context
When to Apply
- Tasks require multiple sequential information-gathering steps where each step informs the next
- The full set of required tool calls cannot be determined upfront without seeing intermediate results
- Auditability of reasoning is a regulatory or operational requirement
- Tool failures must be handled gracefully mid-task with alternative strategies
When NOT to Apply
- Single-step tasks fully answerable from context without tool use
- Hard real-time latency constraints incompatible with multi-iteration inference (< 500ms)
- Tasks with a fully deterministic, pre-known execution path (use Sequential Chain, EAAPL-WRK002)
- Tasks where the scratchpad reasoning would contain sensitive intermediate data that must not be persisted
Prerequisites
- EAAPL-AGT001 (Single Agent Pattern) baseline
- EAAPL-AGT003 (Agent Tool Registry) for tool discovery and invocation
- Iteration limit and cost ceiling configuration
- Structured scratchpad format agreed and documented
Industry Applicability
| Industry |
Example Use Case |
Why ReAct Fits |
| Financial Services |
AML investigation: retrieve transaction → analyse → retrieve counterparty → analyse chain |
Multi-step, each step informs next |
| Legal / Regulatory |
CPS 234 obligation mapping: read clause → search precedent → cross-reference guidance |
Reasoning must be traceable |
| Government |
Grant eligibility assessment: check criteria → retrieve applicant data → reason eligibility |
Multi-source, auditable reasoning required |
| Healthcare |
Clinical pathway query: check diagnosis → retrieve guidelines → identify contraindications |
Safety-critical; reasoning trace required |
| Resources |
Site compliance check: retrieve permit conditions → check measurement data → identify breaches |
Multi-source; auditable |
4. Architecture Overview
The ReAct Loop extends the base agent (EAAPL-AGT001) by making the Thought-Action-Observation cycle the primary execution model. The scratchpad is the central artefact: a running record of every thought, action, and observation in the current task execution.
Thought Phase
The LLM generates a reasoning step grounded in the current scratchpad context. The thought is free-form text that reasons about what is known, what is missing, and what action should be taken next. The thought is appended to the scratchpad. This explicit externalisation of reasoning is what makes ReAct more reliable than chain-of-thought alone — the reasoning is grounded by subsequent observations rather than generating all steps in one pass.
Action Phase
Based on the thought, the LLM generates a structured action: a tool name and parameters extracted from the reasoning. The action format is strict and validated against the Tool Registry (EAAPL-AGT003) before execution. Invalid action formats trigger a thought-level correction cycle rather than a hard error.
Observation Phase
The tool is executed and its result is injected into the scratchpad as an Observation. The observation is truncated to fit within the context window if necessary, with a summary flag indicating truncation. The next thought phase begins from the updated scratchpad.
Termination
The loop terminates when: (a) the LLM produces a Final Answer action, (b) the maximum iteration limit is reached, or (c) the cost ceiling is hit. In cases (b) and (c), the best partial answer from the scratchpad is returned with a metadata flag indicating incomplete execution.
Scratchpad Management
For long-running tasks, the scratchpad can exceed the context window. Context compression (EAAPL-WRK007) is applied: earlier thought-observation pairs are summarised and the summary replaces the raw text, preserving the reasoning trace while reclaiming context budget.
5. Architecture Diagram
flowchart TD
subgraph Input["Task Input"]
A[User Query]
end
subgraph Loop["ReAct Iteration Loop"]
B[Reason]
C{Action Type?}
D[Tool Call]
E[Observe]
F[Iteration Guard]
end
subgraph Termination["Termination"]
G[Final Answer]
H[Partial Answer]
end
subgraph Artefacts["Artefacts"]
I[(Scratchpad Store)]
J[(Audit Log)]
end
A --> B
B --> C
C -->|Tool Use| D
C -->|Final Answer| G
D --> E
E --> F
F -->|continue| B
F -->|limit hit| H
B --> I
D --> I
E --> I
G --> J
H --> J
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Thought Generator |
AI Component |
Produces reasoning step from current scratchpad context |
GPT-4o, Claude 3.5, Gemini 1.5 Pro via inference API |
Critical |
| Action Parser |
Logic Component |
Parses structured action (tool name + params) from thought output |
Regex; JSON schema validation; structured output mode |
Critical |
| Tool Registry Interface |
Integration |
Dispatches validated action to registered tool; returns result |
EAAPL-AGT003 implementation |
Critical |
| Observation Injector |
State Manager |
Appends tool result to scratchpad; manages context budget |
Custom; LangChain scratchpad; LlamaIndex memory buffer |
High |
| Iteration Guard |
Safety |
Enforces max iteration limit and cost ceiling |
Counter in loop state; configurable limits |
Critical |
| Scratchpad Store |
State |
Persists full thought-action-observation chain for audit |
Redis; PostgreSQL; S3 (async) |
High |
| Context Compressor |
Optimisation |
Summarises early scratchpad entries when context budget is low |
EAAPL-WRK007 implementation |
Medium |
| Final Answer Extractor |
Logic |
Extracts and validates the final answer from the terminal thought |
Structured output parser |
Critical |
7. Data Flow
| Step |
Actor |
Action |
Output |
| 1 |
Caller |
Submits query with task context and tool permissions |
Task object: {query, context, allowed_tools, max_iterations, cost_ceiling} |
| 2 |
Thought Generator |
Generates Thought 1 from query context |
Thought: I need to retrieve the current CPS 234 obligations for cloud services. |
| 3 |
Action Parser |
Parses tool call from thought |
Action: search_regulatory_db(query="CPS 234 cloud obligations") |
| 4 |
Tool Registry |
Executes search tool; returns results |
Observation: [3 relevant clauses returned] |
| 5 |
Observation Injector |
Appends observation to scratchpad |
Updated scratchpad with T1-A1-O1 |
| 6 |
Iteration Guard |
Iteration 1 of 10; cost $0.02 of $0.50 ceiling — continue |
Continue |
| 7 |
Thought Generator |
Generates Thought 2 from enriched scratchpad |
Thought: Clause 3.4 references material service providers. I need to check if this vendor is classified as MSP. |
| 8 |
Action Parser |
Parses next tool call |
Action: lookup_vendor_classification(vendor_id="VND-0421") |
| 9 |
Tool Registry |
Executes vendor lookup |
Observation: Vendor VND-0421 classified as MSP — CPS 234 §3.4 applies |
| 10 |
Thought Generator |
Generates Final Answer thought |
Final Answer: Under CPS 234, this vendor is a Material Service Provider. The following obligations apply: [...] |
| 11 |
Scratchpad Store |
Full trace persisted |
Audit record with 2 iterations, 2 tool calls, final answer |
Error Flow
| Error |
Detection |
Recovery |
| Tool call returns error |
Observation contains error payload |
Next thought reasons about the error and selects alternative tool or rephrases |
| Action parse failure (malformed tool syntax) |
Action Parser validation error |
Inject correction observation: "Action format invalid. Use: tool_name(param=value)" |
| Max iterations reached |
Iteration Guard |
Return best partial answer from scratchpad with status: max_iterations_reached |
| Cost ceiling hit |
Cost Monitor |
Immediate termination; return partial answer with status: cost_ceiling_hit |
8. Security Considerations
Prompt Injection via Tool Observations
Tool observations are injected directly into the LLM context. Malicious content in tool results (e.g., a document containing "Ignore all previous instructions") can redirect agent behaviour.
- Mitigation: Wrap all observations in explicit XML delimiters
<observation>...</observation>; validate that the next thought does not deviate from the original task goal; implement thought-level anomaly detection.
OWASP LLM Top 10
| OWASP LLM Risk |
ReAct Applicability |
Mitigation |
| LLM01 Prompt Injection |
Tool observations are attacker-controlled content injected into context |
Observation delimiters; goal-consistency validation on each thought |
| LLM08 Excessive Agency |
Iterative loop can take a long sequence of actions with cumulative side effects |
Iteration limit; write-action approval gate (EAAPL-HITL001); cost ceiling |
| LLM04 Model DoS |
Unbounded loops exhaust inference budget |
Hard iteration limit; cost ceiling enforced before each iteration |
| LLM07 Insecure Plugin Design |
Tool calls execute with agent's permission scope |
Tool registry permission model per EAAPL-AGT003; read-only tools by default |
9. Governance Considerations
Scratchpad as Audit Trail
- The full thought-action-observation scratchpad is the explainability artefact for regulated use cases
- Scratchpads must be retained per the organisation's AI audit log retention policy
- For EU AI Act Art. 13 (Transparency) compliance, the scratchpad must be available to human reviewers on request
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| ReAct Scratchpad Archive |
AI Platform |
Per task; retained 7 years for regulated use |
Explainability and audit evidence |
| Iteration Limit Policy |
AI Governance Board |
Quarterly review |
Documents approved max iterations per task class |
| Tool Permission Matrix |
Security + AI Platform |
On tool addition |
Documents which tools are available to which agents |
| Cost Ceiling Register |
FinOps |
Quarterly |
Per-task-class cost ceilings and utilisation |
10. Operational Considerations
SLOs
| SLO |
Target |
Window |
Alert |
| Task completion rate (Final Answer reached) |
≥ 95% |
24-hour rolling |
< 90% triggers P2; check iteration limits |
| Average iterations per completed task |
≤ 5 |
24-hour rolling |
> 8 avg triggers P3; review task complexity or tool quality |
| p95 task latency (end-to-end) |
≤ 30s |
1-hour rolling |
> 60s triggers P2 |
| Tool call error rate |
≤ 2% |
1-hour rolling |
> 5% triggers P2; check tool registry health |
Monitoring
- Iteration count distribution per task type: long tail indicates tool reliability issues or ambiguous task framing
- Final answer confidence (where model-reported): trending downward indicates context quality degradation
- Scratchpad token length: p95 trending upward indicates context compression is needed
11. Cost Considerations
| Scenario |
Iterations |
Approx. Cost per Task (GPT-4o) |
Notes |
| Simple lookup task |
1–2 |
$0.01–0.05 |
Single tool call; straightforward answer |
| Moderate multi-step task |
3–5 |
$0.05–0.20 |
Typical regulatory Q&A |
| Complex research task |
6–10 |
$0.20–0.80 |
Multi-source synthesis |
| Max iterations (defensive) |
10–15 |
$0.80–2.00 |
Cost ceiling should trigger before this |
Optimisations
- Use smaller model for thought generation and reserve larger model only for synthesis steps
- Cache tool results within session to avoid duplicate tool calls for the same parameters
- Set per-task-class iteration limits based on observed average; adjust monthly
12. Trade-Off Analysis
| Option |
Reasoning Quality |
Auditability |
Latency |
Cost |
Best For |
| A: ReAct with full scratchpad (Recommended) |
High |
Very High |
Medium |
Medium |
Regulated, multi-step tasks |
| B: Single-pass with all tools upfront |
Medium |
Low |
Low |
Low |
Simple, predictable tasks |
| C: Plan-and-Execute (EAAPL-WRK005) |
Very High |
High |
High |
High |
Tasks with known, parallelisable subtasks |
| D: ReAct with context compression |
High |
High |
Medium |
Low–Medium |
Long-running tasks; context-constrained |
Architectural Tensions
| Tension |
Left Pole |
Right Pole |
Balance |
| Iteration depth vs. Latency |
Many iterations for thorough reasoning |
Few iterations for fast response |
Risk-tier: async for thorough; 3-iteration cap for interactive |
| Transparency vs. Privacy |
Full scratchpad retained |
Scratchpad discarded post-task |
Retain for regulated tasks; discard PII-containing scratchpads per policy |
| Tool autonomy vs. Human oversight |
Agent calls any tool in registry |
Every tool call requires approval |
Write-action approval gate; read tools are autonomous |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| Reasoning loop (agent repeats same thought-action) |
Medium |
High — infinite cost without guard |
Iteration Guard; thought-similarity detection |
Detect repeated action signatures; inject loop-break observation |
| Tool observation ignored (agent doesn't update reasoning) |
Medium |
Medium — stale reasoning |
Thought references observation content check |
Prompt engineering: require explicit observation acknowledgement in thought |
| Premature Final Answer (insufficient reasoning) |
Medium |
High — inaccurate output |
Quality gate on Final Answer; confidence check |
Confidence threshold before accepting Final Answer |
| Context overflow (scratchpad exceeds window) |
Low–Medium |
High — context truncation corrupts reasoning |
Token counter before each thought |
Apply EAAPL-WRK007 context compression; alert when > 70% context used |
| Tool result poisoning (injected instructions) |
Low |
High — agent hijacking |
Thought-anomaly detection |
Observation delimiters; goal-drift detection |
14. Regulatory Considerations
EU AI Act
- Art. 13 (Transparency): The scratchpad constitutes the reasoning trace required for high-risk AI system transparency. Must be retained and accessible to competent authorities.
- Art. 14 (Human Oversight): Iteration limits and cost ceilings are technical measures implementing the requirement for human ability to intervene and override automated decision-making.
ISO 42001
- §8.4: The thought-action-observation loop with iteration limits implements the operational controls required for AI system quality management.
NIST AI RMF
- GOVERN 1.1: Documented iteration limits and tool permission matrices constitute the governance policies required for responsible AI deployment.
Australian Context
- APRA CPS 230: Scratchpad retention supports the operational resilience evidence requirements for AI systems used in material business processes.
- AG's AI Ethics Principles: Transparency (scratchpad audit trail) and Accountability (iteration limits, human oversight gate) are directly addressed.
15. Reference Implementations
AWS
| Component |
Service |
| Thought + Action Generation |
Amazon Bedrock (Claude 3.5 Sonnet) with structured output |
| Tool Registry |
AWS Lambda functions registered via Bedrock Tool Use API |
| Scratchpad Store |
Amazon DynamoDB (per-session); S3 for archival |
| Iteration Guard |
AWS Step Functions state machine with MaxAttempts |
| Observability |
Amazon CloudWatch + AWS X-Ray for per-iteration tracing |
Azure
| Component |
Service |
| Thought + Action Generation |
Azure OpenAI Service (GPT-4o) with function calling |
| Tool Execution |
Azure Functions triggered by Action Parser |
| Scratchpad Store |
Azure Cosmos DB (per-session) |
| Orchestration |
Azure Durable Functions (entity functions for loop state) |
| Observability |
Azure Monitor + Application Insights |
On-Premises
| Component |
Technology |
| Thought + Action Generation |
vLLM serving Llama 3.1 70B Instruct |
| ReAct Orchestration |
LangGraph create_react_agent; or custom loop |
| Scratchpad Store |
PostgreSQL with JSONB scratchpad column |
| Tool Registry |
EAAPL-AGT003 on-prem implementation |
| Pattern |
ID |
Relationship Type |
Notes |
| Single Agent Pattern |
EAAPL-AGT001 |
Base Pattern |
ReAct extends the base agent loop with explicit thought-action-observation structure |
| Agent Tool Registry |
EAAPL-AGT003 |
Depends On |
All tool calls in the Action phase are dispatched via the Tool Registry |
| Tool Call Orchestration |
EAAPL-WRK006 |
Peer |
WRK006 covers detailed tool-call execution mechanics within the action phase |
| Context Compression |
EAAPL-WRK007 |
Integrates With |
Applied when scratchpad approaches context window limit |
| Plan-and-Execute |
EAAPL-WRK005 |
Alternative |
Upfront planning with separate execution; prefer when task subtasks are known in advance |
| Human Escalation |
EAAPL-HITL001 |
Integrates With |
Write-action approval gate before destructive tool calls |
17. Maturity Assessment
Overall Maturity: Proven
| Dimension |
Score (1–5) |
Evidence |
| Research Foundation |
5 |
ReAct paper (Yao et al., 2022) widely replicated; standard in academic literature |
| Production Deployment |
4 |
Deployed at scale in enterprise chatbots, code assistants, research tools |
| Framework Support |
5 |
LangChain, LlamaIndex, LangGraph all implement ReAct as first-class pattern |
| Tooling Maturity |
4 |
Observability and scratchpad tooling maturing; standard iteration/cost controls established |
| Cost Optimisation |
3 |
Model routing and caching patterns established but not yet universally standardised |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2025-06-13 |
Architecture Board |
Initial publication in Agentic Workflows category |