EAAPL-RAG007Proven

Agentic Retrieval-Augmented Generation

Retrieval-Augmented GenerationEU AI ActISO/IEC 42001

[EAAPL-RAG007] Agentic Retrieval-Augmented Generation

Category: Artificial Intelligence / Retrieval-Augmented Generation Sub-category: Agentic and Multi-Hop RAG Version: 1.2 Maturity: Proven Tags: rag agentic multi-hop iterative-retrieval query-planning self-critique tool-use reasoning Regulatory Relevance: EU AI Act Article 14 (Human oversight for autonomous systems), ISO/IEC 42001 Section 8.5 (AI system operation), NIST AI RMF (Govern 1.7 — autonomous decision-making)

1. Executive Summary

Agentic RAG places a reasoning AI agent in the orchestration loop of the retrieval-generation pipeline, enabling it to plan multi-step retrieval strategies, evaluate the quality of retrieved evidence, execute iterative searches to fill gaps, and self-critique its nascent answers before returning a final response. Unlike standard RAG, which executes a single fixed retrieval cycle per query, Agentic RAG can break a complex question into sub-questions, retrieve evidence for each, synthesise intermediate answers, identify what it still doesn't know, and retrieve again — autonomously, until it has sufficient grounded evidence to answer with confidence.

For enterprise leaders, Agentic RAG addresses a category of knowledge queries that single-cycle RAG fundamentally cannot handle: multi-hop questions that require chaining across multiple documents ("Which of our suppliers are located in regions affected by the sanctions announced in the regulatory alert published this week, and what are our contractual obligations for those suppliers?"), questions requiring synthesis across multiple evidence sources, and research-style queries that require iterative exploration of a knowledge domain. The business value is the automation of knowledge work that previously required a skilled analyst to retrieve, read, synthesise, and reason across multiple documents — turning a 2-hour research task into a 30-second AI-assisted workflow.

2. Problem Statement

Business Problem

Single-cycle RAG systems retrieve a fixed set of candidate documents and generate an answer from them in one pass. This architecture is adequate for factual lookup questions but insufficient for analytical questions that require multi-document synthesis, causal reasoning, or iterative hypothesis refinement. Analysts who use RAG systems for complex research tasks frequently report that the system misses relevant documents that would only be found if the initial answer had been used to formulate a better search query.

Technical Problem

Standard RAG has no mechanism for the system to recognise that its retrieved context is insufficient to answer the question, no ability to plan a multi-step retrieval strategy before retrieving, and no capacity to iteratively refine its retrieval based on partial answers. The single retrieval cycle architecture is fundamentally limited for complex, multi-hop knowledge questions.

Symptoms

RAG system answers complex analytical questions with shallow, incomplete responses that reference only one or two source documents
Users manually perform "follow-up searches" after receiving an initial RAG answer, indicating the system did not exhaust the relevant knowledge
High analyst override rate: AI-generated answers are frequently edited with additional context that the analyst had to find separately
The system answers "I don't have enough information" for questions that are answerable from the corpus but require multi-hop reasoning

Cost of Inaction

Complex knowledge work remains manual and unautomated, despite a capable knowledge corpus
Analysts treat the RAG system as a first-pass tool only, not as a research assistant capable of deep synthesis
Competitive disadvantage versus AI deployments that automate complex analytical workflows end-to-end

3. Context

When to Apply

Complex multi-hop questions requiring evidence from multiple, sequentially discovered documents
Research synthesis tasks: competitive analysis, regulatory impact assessment, risk analysis across multiple source documents
Question answering where the answer to one sub-question determines which documents to retrieve next (dependent retrieval chains)
Use cases where the appropriate retrieval strategy varies per question (some questions need one retrieval; others need five)
Scenarios where the agent should be able to answer "I cannot find sufficient evidence" after exhausting retrieval strategies, rather than hallucinating

When NOT to Apply

Simple factual lookups answerable in a single retrieval cycle (adds unnecessary latency and cost)
Latency-critical applications (agentic loops add 1–10 seconds of reasoning time per iteration)
Autonomous action-taking scenarios where the agent's retrieval findings would directly trigger writes or external API calls without human approval — this requires explicit Human-in-the-Loop governance (EAAPL-RAG007 combined with HITL patterns)
Corpora that are too small to benefit from multi-hop retrieval (< 10,000 documents)

Prerequisites

An underlying RAG retrieval capability (EAAPL-RAG001 or EAAPL-RAG005)
An LLM with reliable tool-calling / function-calling capability (GPT-4o, Claude 3.5, Gemini 1.5 Pro)
A defined set of retrieval tools the agent can invoke (search, filter, summarise, compare)
A maximum iteration limit to prevent infinite loops
Observability instrumentation that traces each agent decision, tool call, and its output

Industry Applicability

Industry	Complex Query Example	Multi-Hop Depth
Financial Services	"Which loan products in our portfolio have covenant conditions that may be triggered by the RBA rate change announced today?"	3–4 hops
Legal	"Summarise all cases in our case database where a precedent from Smith v Jones was applied in an employment context"	3–5 hops
Procurement	"Identify all active suppliers in our approved supplier list that are headquartered in sanctioned jurisdictions per this week's OFAC update"	2–3 hops
Healthcare	"Are there any patients in our clinical notes who have been prescribed Drug X and also have contraindicated Condition Y per this month's formulary update?"	2–4 hops
Strategy	"Summarise all internal research reports that reference our top 5 competitors and were published in the last 12 months"	2–3 hops

4. Architecture Overview

Agentic RAG wraps the retrieval-generation pipeline in a Reason-Act-Observe loop (a specialisation of the ReAct framework). The agent is an LLM with access to a defined set of retrieval tools and a scratchpad for recording its reasoning steps.

Query Planning

When a complex question is received, the agent's first step is query planning: it reasons about what information is needed to answer the question and decomposes the original query into a set of sub-queries with dependencies. For example, "Which of our suppliers are in sanctioned regions per this week's alert?" decomposes into: (1) retrieve this week's sanctions alert, (2) extract the list of sanctioned regions, (3) retrieve the supplier list, (4) cross-reference supplier regions against the sanctioned list. The query plan is recorded in the agent scratchpad and drives the retrieval sequence.

Query planning can be explicit (the agent writes a multi-step plan before executing any retrieval) or implicit (the agent uses tool calling in sequence, with each tool output informing the next call). Explicit planning produces more predictable and auditable behaviour; implicit sequential tool calling is more flexible but harder to trace.

Retrieval Tools

The agent is given a defined set of retrieval tools as callable functions:

search(query: str, filters: dict) → List[Chunk]: standard RAG retrieval
get_document(doc_id: str) → Document: retrieve a full document by ID (for follow-up reading)
summarise_results(chunks: List[Chunk]) → str: summarise a set of retrieved chunks
compare_documents(doc_ids: List[str], aspect: str) → str: compare specific aspects across multiple documents
extract_entities(chunks: List[Chunk], entity_type: str) → List[str]: extract named entities for use as next-query parameters

Tools are strictly typed and validated — the agent cannot invoke arbitrary code, only the defined tool set. Tool definitions include descriptions that guide the agent's tool selection.

Iterative Retrieval and Self-Critique

After each retrieval cycle, the agent evaluates its current evidence against the original question using a self-critique prompt: "Given the question and the evidence retrieved so far, what information is still missing? What additional searches would improve the answer?" If the self-critique identifies gaps, the agent formulates and executes additional retrieval calls. The loop continues until one of three stopping conditions is met: (1) the agent's self-critique determines the evidence is sufficient, (2) the maximum iteration limit is reached, or (3) additional retrieval is producing diminishing returns (identical or near-identical results to previous retrieval steps).

Grounded Final Synthesis

When the retrieval loop concludes, the agent synthesises a final answer from the accumulated evidence. The synthesis step is more complex than in standard RAG: the agent must integrate evidence from multiple retrieval rounds, handle potentially conflicting evidence, attribute each claim to its source, and structure the answer appropriately (summary, list, table, or prose depending on the question type). The synthesis prompt explicitly instructs the agent to cite each claim and to acknowledge when evidence is incomplete or conflicting.

Human Oversight Gate

For high-stakes agentic tasks (those whose answers will directly inform consequential decisions), an optional human oversight gate is inserted before final response delivery. The gate presents the agent's reasoning trace, the evidence sources used, and the draft answer to a human reviewer, who can approve, edit, or reject the answer. This gate is a regulatory requirement for high-risk AI use cases under EU AI Act Article 14.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Ingress["Query Ingress"] A[User Query] B[Query Planner] end subgraph AgentLoop["ReAct Agent Loop"] C{Reason + Act} D[Retrieval Tools] E[Self-Critique] end subgraph Backend["RAG Infrastructure"] F[Vector + BM25 Index] G[Document Store] end subgraph Output["Synthesis + Delivery"] H[Grounded Synthesis] I[Audit Log] end A --> B --> C C -->|select tool| D D --> F D --> G D -->|observations| E E -->|insufficient| C E -->|sufficient| H H --> A H --> I style A fill:#dbeafe,stroke:#3b82f6 style B fill:#f0fdf4,stroke:#22c55e style C fill:#f3e8ff,stroke:#a855f7 style D fill:#f0fdf4,stroke:#22c55e style E fill:#f3e8ff,stroke:#a855f7 style F fill:#fef9c3,stroke:#eab308 style G fill:#fef9c3,stroke:#eab308 style H fill:#d1fae5,stroke:#10b981 style I fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Query Planner	NLP / LLM	Decompose complex query into sub-queries with dependencies	LLM function calling; LangChain Plan-and-Execute; LlamaIndex ReAct	High
ReAct Agent Orchestrator	Orchestration	Drive the Reason-Act-Observe loop; manage scratchpad; enforce iteration limits	LangChain AgentExecutor; LlamaIndex ReActAgent; AutoGen; custom	Critical
Retrieval Tool: search	Retrieval	Execute RAG retrieval (delegates to EAAPL-RAG001/005)	LangChain retriever tool; custom tool wrapper	Critical
Retrieval Tool: get_document	Retrieval	Fetch full document for deep reading	Document store SDK as tool	High
Retrieval Tool: summarise	LLM	Summarise retrieved chunks into concise evidence	LLM call within tool	Medium
Retrieval Tool: compare	LLM	Compare specific aspects across multiple documents	LLM call within tool	Medium
Self-Critique Module	LLM	Evaluate current evidence sufficiency; identify gaps	Structured LLM prompt with stopping criteria	High
Scratchpad / Working Memory	Storage	Record agent reasoning, tool calls, and observations per session	In-memory dict (short sessions); Redis (long sessions)	High
Human Oversight Gate	Workflow	Present agent reasoning to human reviewer for high-stakes decisions	Custom UI; Slack workflow; ServiceNow integration	High (regulated use)
Iteration Limiter	Safety	Enforce maximum loop iterations; prevent infinite loops	Hard counter in orchestrator; token budget guard	Critical
Agentic Audit Logger	Compliance	Record full reasoning trace, all tool calls, and all sources used	Langfuse, Arize AI, custom structured logger	Critical

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	User	Submit complex multi-hop query	Query string
2	Query Planner	Decompose query into sub-queries; record in scratchpad	Sub-query list + dependency graph
3	Agent (Reason)	Select next action based on scratchpad state	Tool call specification
4	Tool Executor	Execute selected tool (search / get_document / summarise)	Tool output (chunks / document / summary)
5	Agent (Observe)	Record tool output in scratchpad	Updated scratchpad
6	Self-Critique	Evaluate: "Is the evidence sufficient? What is still missing?"	Continue signal OR Stop signal
7	Loop (if Continue)	Return to step 3 with updated scratchpad context	New tool call based on gaps identified
8	Synthesis (if Stop)	Integrate all scratchpad evidence into final answer with citations	Draft answer + full citation list
9	Human Oversight Gate (if required)	Present reasoning trace and draft to human reviewer	Approved / Edited / Rejected decision
10	Audit Logger	Record complete reasoning trace, tool call sequence, all source IDs	Immutable audit record
11	Response Delivery	Return final answer with citations and (optionally) reasoning trace	Final response

Error Flow

Error Condition	Detection	Recovery
Maximum iteration limit reached without sufficient evidence	Iteration counter	Return best-available answer with "Incomplete evidence" flag; log for quality review
Tool call returns no results (empty retrieval)	Tool output validation	Agent reasons about the empty result; may reformulate query or acknowledge knowledge gap
Agent enters circular retrieval loop (same query repeated)	Query deduplication in scratchpad	Detect repeated tool calls; break loop; proceed to synthesis with available evidence
LLM tool-calling error (malformed function call)	JSON schema validation	Retry with re-prompted clarification; max 3 retries; escalate to error state

8. Security Considerations

Tool Boundary Enforcement

The most critical security control in Agentic RAG is ensuring the agent cannot invoke tools outside its defined tool set. All retrieval tools are read-only — the agent must have no access to write, update, or delete operations. Tool definitions must be validated against a schema; any tool invocation not matching a defined schema must be rejected. The agent runtime must run in a sandboxed environment with no outbound network access beyond the defined tool API endpoints.

Prompt Injection in Multi-Hop Context

Multi-hop retrieval creates a compounded prompt injection risk: an adversarial document retrieved in hop 1 could inject instructions into the agent's scratchpad that influence hop 2 onwards. The agent system prompt must explicitly instruct the model that retrieved content is data, not instructions, and the orchestrator must sanitise tool outputs before inserting them into the agent context window.

OWASP LLM Top 10 Mitigations

OWASP LLM Risk	Agentic-Specific Concern	Mitigation
LLM01: Prompt Injection	Adversarial content retrieved in hop N influences agent reasoning for hop N+1	Tool output sanitisation; treat all tool outputs as untrusted data in system prompt
LLM07: Insecure Plugin Design	Agentic tools are equivalent to plugins; must be strictly typed and scoped	Read-only tools only; strict JSON schema for tool calls; no shell or code execution tools
LLM08: Excessive Agency	Agent autonomously executes many retrieval steps; scope creep risk	Iteration limit; read-only tools; human oversight gate for consequential outputs
LLM04: Model Denial of Service	Runaway agent loops with expensive LLM calls	Hard iteration limit; token budget guard; per-user session cost limit

9. Governance Considerations

Autonomous Decision Boundary

Agentic RAG must have a clearly defined boundary between autonomous retrieval (permitted) and autonomous action-taking (not permitted in this pattern). The agent may search, read, summarise, and synthesise — but the final answer must be delivered to a human, not used to trigger downstream automated actions without explicit human approval.

Reasoning Trace as Governance Artefact

Every agentic session must produce a complete, immutable reasoning trace: the initial query, each reasoning step, each tool call with its parameters, each tool output, and the final synthesis. This trace is the primary governance artefact for post-hoc review of agentic decisions. Regulators reviewing an AI-assisted compliance analysis must be able to trace every claim back to the retrieved document that supported it.

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Agentic Session Trace	AI Operations	Per session	Full audit trail of all reasoning steps and tool calls
Tool Call Audit	Security	Weekly	Review tool call patterns for anomalies or scope creep
Human Override Rate Report	AI Governance	Monthly	Track rate at which human reviewers edit or reject agentic outputs
Iteration Distribution Report	AI Operations	Weekly	Monitor average and P99 iteration counts; identify expensive query types

10. Operational Considerations

Monitoring

Metric	Alert Threshold	Notes
Average iterations per session	> 8 (investigate)	May indicate query types too complex for available corpus
Max iteration cap hit rate	> 10% of sessions	Corpus coverage or agent capability issue
Agentic session P95 latency	> 30 seconds	Optimise tool call latency; increase parallelism
Tool call error rate	> 2%	Tool API health issue
Human override rate	> 30%	Agent quality degradation; review self-critique prompts

Service Level Objectives

SLO	Target	Notes
Agentic session completion P95	≤ 20 seconds	Depends on corpus and query complexity
Iteration cap hit rate	< 5%	Measure of query/corpus fit
Reasoning trace completeness	100%	Every session must have a complete audit trace

11. Cost Considerations

Cost Drivers

Cost Driver	Notes	Optimisation
LLM inference per iteration	Each reasoning step is an LLM call; N iterations = N+1 LLM calls per session	Use smaller model for self-critique; use premium model only for final synthesis
Tool call overhead (embedding per search)	Each search tool call re-embeds a potentially modified sub-query	Cache embeddings for sub-queries that are identical to previous iterations
Context window growth	Scratchpad grows with each iteration; LLM input cost increases linearly	Summarise and compress scratchpad after every 3 iterations
Human oversight gate	Human reviewer time for high-stakes queries	Reserve HITL for designated high-stakes query types only

Indicative Cost Range

Use Case	Sessions/Day	Average Iterations	Cost per Session	Monthly Cost
Research assistant (analysts)	100	4	$0.50–$2.00	$1,500–$6,000
Compliance Q&A	500	3	$0.30–$1.00	$4,500–$15,000
Complex legal research	50	6	$1.00–$4.00	$1,500–$6,000

12. Trade-Off Analysis

Agentic Orchestration Framework Comparison

Framework	Flexibility	Observability	Production Readiness	Recommendation
LangChain AgentExecutor + ReAct	High	Good (LangSmith integration)	Proven	Strong choice; large ecosystem
LlamaIndex ReActAgent	High	Good (built-in trace logging)	Proven	Strong for document-heavy use cases
AutoGen (Microsoft)	Very High (multi-agent)	Moderate	Emerging	Complex multi-agent scenarios
Custom orchestrator	Maximum	Custom	Depends	When framework limitations are binding

Query Planning Strategy

Strategy	Predictability	Flexibility	Auditability
Explicit plan-then-execute	High	Low (plan may not adapt to unexpected retrieval results)	Highest
Implicit sequential tool calling	Medium	High	Medium
Hybrid (explicit plan, adaptive execution)	High	High	Highest

Architectural Tensions

Tension	Trade-off	Recommendation
Iteration depth vs. latency	Deep iteration: complete answer; shallow: fast	Configure max iterations per query type; user-selectable "quick" vs. "thorough" modes
Self-critique verbosity vs. cost	Verbose critique: better gap identification; concise: cheaper	Structured JSON self-critique with fixed fields; not free-form prose

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Circular retrieval loop (agent retrieves same content repeatedly)	Medium	High (cost + latency)	Query deduplication in scratchpad; loop detection	Detect repeated tool calls; break loop; proceed to synthesis
Hallucinated tool calls (agent invents tool that doesn't exist)	Low	High	Tool call schema validation	Reject invalid tool calls; re-prompt with valid tool list
Scratchpad context overflow (exceeds LLM context window)	Medium	High	Token count monitoring	Compress scratchpad after N iterations; summarise older evidence
Agent misattributes evidence (cites wrong source)	Medium	High	Citation validation post-synthesis	Cross-reference every cited source ID against tool call outputs in audit trace
Self-critique always returns "continue" (runaway optimism)	Low	High	Iteration cap hit rate monitoring	Hard iteration cap; self-critique must use structured stopping criteria

14. Regulatory Considerations

Regulation	Requirement	Agentic RAG Response
EU AI Act Article 14	Human oversight capability for high-risk AI systems	Human oversight gate for agentic outputs used in consequential decisions
EU AI Act Article 13	Transparency: users must understand how AI-generated outputs were produced	Reasoning trace available on request; session scratchpad as explainability artefact
ISO/IEC 42001 Section 8.5	AI system operation includes monitoring autonomous behaviours	Iteration monitoring; tool call audit; human override rate tracking
NIST AI RMF Govern 1.7	Document and manage AI system autonomy levels	Autonomy boundary documented: retrieval is autonomous; final answer requires human review for high-stakes use cases

15. Reference Implementations

AWS

Agent: Bedrock Agents (native multi-hop retrieval) or LangChain on Lambda
Retrieval tool: Amazon Kendra or OpenSearch k-NN
Scratchpad: Amazon ElastiCache (Redis) for session state
Audit: CloudWatch Logs with structured JSON; X-Ray for trace

Azure

Agent: Azure AI Studio Prompt Flow with agent orchestration, or LangChain on Azure Functions
Retrieval tool: Azure AI Search
Scratchpad: Azure Cache for Redis
Audit: Azure Monitor + Application Insights

GCP

Agent: Vertex AI Agent Builder (Grounding with Search) or LlamaIndex on Cloud Run
Retrieval tool: Vertex AI Vector Search
Scratchpad: Cloud Memorystore (Redis)
Audit: Cloud Trace + Cloud Logging

Pattern ID	Pattern Name	Relationship
EAAPL-RAG001	Enterprise RAG	Agentic RAG wraps and repeatedly invokes the RAG001 retrieval layer
EAAPL-RAG005	Hybrid RAG	Recommended retrieval strategy for each agent search tool call
EAAPL-RAG009	Graph RAG	Agent may invoke graph traversal as an additional tool alongside vector search
EAAPL-RAG003	Secure RAG	ACL enforcement applies to every tool call within the agentic loop

17. Maturity Assessment

Overall Maturity: Proven — Agentic RAG is deployed in production for research and compliance use cases; ReAct and Plan-and-Execute frameworks are mature; the primary ongoing challenges are iteration cost management and reasoning trace audit quality.

Dimension	Score (1–5)	Rationale
Technology Readiness	4	LLM tool calling is GA and reliable; orchestration frameworks are production-grade
Tooling Ecosystem	4	LangChain, LlamaIndex, Bedrock Agents, Azure AI Studio support agentic patterns
Operational Guidance	3	Loop management and cost optimisation require tuning expertise
Security & Compliance	3	Prompt injection in multi-hop contexts and tool boundary enforcement require careful implementation
Scalability Evidence	3	Session-based; horizontal scaling straightforward; cost per session grows with complexity
Cost Predictability	2	Iteration count variability makes cost highly query-dependent; monitoring and alerting essential

18. Revision History

Version	Date	Author	Changes
1.0	2024-07-01	EAAPL Working Group	Initial publication
1.1	2024-10-15	EAAPL Working Group	Self-critique module formalised; circular loop detection added
1.2	2025-04-01	EAAPL Working Group	Human oversight gate added; EU AI Act Article 14 mapping; scratchpad compression strategy

Track this pattern for APRA/ASIC review

← Back to Library More Retrieval-Augmented Generation →