EAAPL-RAG004Proven

Federated Retrieval-Augmented Generation

Retrieval-Augmented GenerationAPRA CPS234EU AI Act

[EAAPL-RAG004] Federated Retrieval-Augmented Generation

Category: Artificial Intelligence / Retrieval-Augmented Generation Sub-category: Distributed and Federated Retrieval Version: 1.1 Maturity: Emerging Tags: rag federated distributed data-sovereignty privacy-preserving cross-organisation score-normalisation Regulatory Relevance: GDPR Chapter V (Cross-border transfers), Privacy Act 1988 Part IIIA, EU AI Act Article 10, APRA CPS234 §55 (third-party data), Australian Data Sovereignty requirements

1. Executive Summary

Federated RAG enables retrieval across multiple organisationally or geographically distributed knowledge bases without requiring data to be centralised in a single repository. Each participating node maintains its own vector index and document store, enforces its own access controls, and responds to retrieval requests without exposing raw documents to the federation coordinator. A central orchestrator fans out queries to participating nodes, collects ranked result sets, normalises scores across heterogeneous indexes, and assembles the final context for the language model.

For CIOs navigating cross-border data regulations, multi-entity corporate structures, or public-private data sharing partnerships, Federated RAG provides a principled architecture for enabling AI-powered knowledge synthesis across organisational boundaries without violating data residency, sovereignty, or confidentiality obligations. The pattern is relevant to government agency networks (where each agency maintains its own data under separate legislative mandates), healthcare provider networks (where patient data cannot leave jurisdictions), joint ventures (where each party contributes knowledge without disclosing the full corpus to the other), and multi-cloud enterprise architectures (where data must remain within specific cloud regions due to contractual or regulatory constraints).

2. Problem Statement

Business Problem

Organisations increasingly need to answer questions that span multiple independent knowledge bases, but the data owners of those bases have legitimate legal, commercial, or regulatory reasons not to share raw content. A government department asked to answer a cross-agency compliance question should not need to copy another agency's classified data into its own systems. A hospital network should not centralise patient records from all member hospitals into a single repository to enable AI search. A joint venture should not require each partner to surrender commercial-in-confidence documents to a shared index.

Technical Problem

Standard centralised RAG requires all documents to be ingested into a single vector index. This is architecturally incompatible with data sovereignty requirements, cross-border transfer restrictions, organisational trust boundaries, and contractual non-disclosure obligations. Naive distributed solutions (running independent RAG systems and aggregating answers) compound the problem: scores from different indexes are not comparable, different indexes may use different embedding models, and there is no principled mechanism for cross-node relevance ranking.

Symptoms

Stalled AI initiatives because legal and compliance teams cannot approve centralised data aggregation
Separate, siloed AI assistants per business unit with no ability to answer cross-unit questions
Manual "copy and paste from multiple systems" workflows for staff who need cross-domain answers
Joint venture or consortium participants rejecting shared data infrastructure proposals

Cost of Inaction

Inability to leverage AI for cross-organisational knowledge synthesis, leaving manual research workflows in place
Missed regulatory intelligence: compliance teams unable to get cross-jurisdiction answers from distributed regulatory corpora
Competitive disadvantage versus organisations that have successfully federated knowledge infrastructure

3. Context

When to Apply

Cross-organisation AI knowledge sharing where raw data cannot be centralised (government agencies, healthcare networks, industry consortia)
Multi-jurisdiction deployments where data must remain within specific geographic boundaries
Joint ventures and public-private partnerships where each party retains data sovereignty
Large enterprises with independent subsidiaries or business units that have separate data governance regimes
Regulatory constraint: the organisation is subject to data localisation requirements (EU GDPR Chapter V, Australian data sovereignty) that prohibit centralised indexing

When NOT to Apply

Data can be legally and contractually centralised (use EAAPL-RAG001 for significantly better retrieval quality)
Single-tenant deployment with no cross-boundary requirements
Latency requirements preclude network round-trips to remote nodes (federated retrieval adds 50–200ms per node)
All nodes use incompatible embedding models and re-embedding is not feasible (score normalisation quality degrades severely)

Prerequisites

Each participating node must expose a standardised retrieval API (REST or gRPC) that accepts a query vector and metadata filters and returns ranked result sets with scores and metadata
Score normalisation requires nodes to use compatible (ideally identical) embedding models, or to implement score calibration
A federation coordination layer (either a dedicated service or the primary node acting as coordinator)
Inter-node network connectivity with appropriate security (mTLS, API keys, VPN tunnels between trusted nodes)
Data sharing agreement or federation protocol specifying what metadata each node may expose

Industry Applicability

Industry	Federation Participants	Data Sovereignty Driver	Example Use Case
Government	Federal + State agencies	Legislative jurisdiction separation	Whole-of-government regulatory knowledge assistant
Healthcare	Hospital networks + GP clinics	Patient data cannot leave jurisdiction	Clinical decision support across care network
Financial Services	Group entities + subsidiaries	Separate legal entities; intra-group data transfer rules	Group-wide risk and compliance knowledge base
Education	Universities in national consortium	FERPA/Privacy Act per institution	National research knowledge assistant
Defence & Intelligence	Allied nation agencies	National security classification regimes	Coalition knowledge sharing (lowest-classification tier)

4. Architecture Overview

Federated RAG decomposes into three functional layers: local nodes (which own their data and serve retrieval requests), the federation coordinator (which orchestrates cross-node retrieval and normalises results), and the generation layer (which assembles the federated context and invokes the LLM).

Local Node Architecture

Each local node is a complete, independently operated RAG stack: document ingestion pipeline, chunking engine, embedding model, vector database, and a retrieval API endpoint. The node enforces its own access controls — the coordinator has no ability to bypass the node's ACL enforcement. The node exposes a retrieval endpoint that:

Accepts a query vector (already embedded by the coordinator) and optional metadata filters
Enforces local ACL for the requesting entity (the coordinator's identity, not the end user's)
Returns a ranked result set: [(chunk_text, metadata, score, chunk_id)] — crucially, never raw document bytes
Applies rate limiting per coordinator identity to prevent denial-of-service

The node's governance principle is: the node owner controls what the node reveals. The node may return only metadata and scores (no chunk text) for sensitive documents, requiring the coordinator to resolve context from an approved escrow, or to simply note "relevant classified content exists but cannot be included in context."

Federation Coordinator

The coordinator is the query orchestration layer. Upon receiving a user query, the coordinator:

Embeds the query using the shared embedding model
Determines which nodes to query based on the query's topic scope and the user's inter-node authorisation
Fans out the query vector in parallel to all relevant nodes (with a configurable timeout per node)
Collects ranked result sets from responding nodes
Applies score normalisation to make scores from different nodes comparable
Re-ranks the normalised result set to produce a unified top-K list
Assembles context from the returned chunk texts (subject to each node's disclosure level)
Records which nodes were queried, which responded, and which declined (for audit and transparency)

Score Normalisation

Score normalisation is the most technically complex component of Federated RAG. When nodes use the same embedding model, cosine similarity scores are on the same scale and can be compared directly. When models differ (which should be avoided but may be unavoidable), score normalisation is required: each node's score distribution is modelled (mean and standard deviation) using calibration queries, and scores are Z-normalised before cross-node ranking. Reciprocal Rank Fusion (RRF) provides an alternative that is model-agnostic and rank-based, requiring only a ranked list from each node rather than raw similarity scores.

Privacy-Preserving Retrieval Modes

For maximum privacy preservation, the coordinator can operate in a query vector obfuscation mode using techniques from privacy-preserving machine learning: the query vector is perturbed with calibrated noise before being sent to nodes (analogous to differential privacy), preventing nodes from reconstructing the original query. This mode trades some retrieval quality for stronger privacy guarantees — the node cannot infer the full semantic content of the query. This mode is recommended for inter-organisation federation where the coordinator does not fully trust all nodes.

5. Architecture Diagram

ARCHITECTURE DIAGRAM

flowchart TD subgraph Nodes["Federated Nodes"] A[Node A API] B[Node B API] C[Node C API] end subgraph Coordinator["Federation Coordinator"] D[User Query] E[Node Router] F[Score Normaliser] G[LLM Generation] end subgraph Audit["Governance"] H[Federation Audit Log] end D --> E E -->|fan-out mTLS| A E -->|fan-out mTLS| B E -->|fan-out mTLS| C A -->|ranked results| F B -->|ranked results| F C -->|declined or results| F F --> G --> D E --> H F --> H style D fill:#dbeafe,stroke:#3b82f6 style E fill:#f0fdf4,stroke:#22c55e style A fill:#fef9c3,stroke:#eab308 style B fill:#fef9c3,stroke:#eab308 style C fill:#fef9c3,stroke:#eab308 style F fill:#f0fdf4,stroke:#22c55e style G fill:#d1fae5,stroke:#10b981 style H fill:#fef9c3,stroke:#eab308

6. Components

Component	Type	Responsibility	Technology Options	Criticality
Local Node Retrieval API	Integration	Serve retrieval requests to the coordinator; enforce local ACL	FastAPI / gRPC service; cloud API Gateway	Critical
Local Vector Database	Storage	Node-local vector index; never exposed directly to coordinator	Weaviate, Qdrant, pgvector, OpenSearch	Critical
Shared Embedding Model	ML Inference	Ensure all nodes embed in the same vector space	Agreed embedding model deployed at each node; identical version	Critical
Federation Coordinator	Orchestration	Fan out queries; collect and normalise results; assemble context	Custom Python service; LangChain with custom retriever; LlamaIndex federation	Critical
Node Router	Business Logic	Select which nodes to query for a given query type	Custom routing rules + topic classifier	High
Score Normaliser	Algorithm	Normalise heterogeneous scores to a comparable scale	RRF (rank-based, model-agnostic) or Z-normalisation (same model required)	High
Global Re-ranker	Ranking	Re-rank normalised cross-node results	Cross-encoder re-ranker; Cohere Rerank API	Medium
Federation Audit Logger	Compliance	Record which nodes were queried, which responded, which declined	Append-only log store; tamper-evident	High
Inter-Node Security (mTLS)	Security	Authenticate coordinator to each node; encrypt retrieval traffic	mTLS certificates (SPIFFE/SPIRE for zero-trust); API key + TLS fallback	Critical
Query Vector Obfuscator (optional)	Privacy	Add calibrated noise to query vector before sending to untrusted nodes	Custom differential privacy library; RAPPOR-style perturbation	Medium

7. Data Flow

Primary Flow

Step	Actor	Action	Output
1	Each local node	Ingest documents from local sources; build local vector index	Independent local vector indexes per node
2	User	Submit query to federation coordinator	Query string + user identity
3	Query Embedder	Embed query using shared embedding model	Query vector
4	Node Router	Determine relevant nodes based on query topic scope and user's inter-node authorisation	List of target node endpoints
5	Federation Coordinator	Fan out query vector in parallel to all target nodes with inter-node auth credentials	Parallel retrieval requests
6	Each local node	Validate coordinator identity; enforce local ACL; execute local ANN search; return ranked results	`[(chunk_text, metadata, score)] per node` or `DECLINED`
7	Score Normaliser	Apply RRF or Z-normalisation to align scores across nodes	Normalised, unified candidate list with node provenance
8	Global Re-ranker	Re-rank unified candidate list	Top-N candidates
9	Context Assembler	Assemble prompt with node provenance labels on each chunk	Assembled prompt
10	LLM	Generate answer with cross-node citations	Raw response
11	Response	Return answer with attribution to specific nodes	Answer + `[Source: Node A - Doc X]` citations
12	Federation Audit Logger	Record complete query execution: nodes queried, response times, scores, declinations	Immutable audit record

Error Flow

Error Condition	Detection	Recovery
Node timeout (node unavailable or slow)	Per-node timeout (configurable, default 2s)	Proceed with results from available nodes; note unavailable node in response metadata
Node declines query (ACL rejection for coordinator identity)	Node returns 403	Log declination; note in response: "Some sources unavailable due to access restrictions"
Score normalisation failure (node uses different embedding model)	Score distribution anomaly detection	Fall back to RRF (rank-based); flag model mismatch for node operator
All nodes decline or timeout	No results assembled	Return "No accessible content found across federated sources"; do not generate from LLM

8. Security Considerations

Inter-Node Trust Model

The federation coordinator authenticates to each node using mTLS mutual authentication. Each node maintains a whitelist of coordinator certificate fingerprints it trusts. This prevents unauthorised coordinators from querying nodes. Nodes do not trust the coordinator's claim about end-user identity — nodes enforce only the coordinator's identity against the inter-node access policy.

Data Minimisation in Federation Protocol

The retrieval API response should include the minimum data necessary: chunk text (if the node's data sharing agreement permits), score, and metadata (document title, date, classification). Raw document bytes are never returned in the retrieval protocol. For highly sensitive nodes, chunk text may be replaced with a summary or an opaque reference, with the coordinator noting "relevant content exists but is not disclosable."

OWASP LLM Top 10 Mitigations

OWASP LLM Risk	Federated Specific Concern	Mitigation
LLM01: Prompt Injection	Malicious content injected in one node's documents propagates through federation to the coordinator's LLM	Node-side content sanitisation; coordinator treats all retrieved content as untrusted data
LLM06: Sensitive Information Disclosure	Coordinator may inadvertently reveal to node B that node A has a specific document (via query vectors)	Query vector obfuscation for inter-organisation federation; query vector is not logged at nodes
LLM09: Overreliance	User assumes all nodes were queried; a declined or timed-out node is silently excluded	Explicitly surface node availability status in every response

9. Governance Considerations

Federation Governance Framework

Each federation must have a formal data sharing agreement that specifies: which nodes participate, what metadata each node may expose, the agreed embedding model version, the score normalisation method, audit log sharing obligations, and the dispute resolution process when a node incorrectly declines a coordinator request.

Governance Artefacts

Artefact	Owner	Frequency	Purpose
Federation Data Sharing Agreement	Legal / All Participating Organisations	Per federation; reviewed annually	Legal basis for cross-node retrieval
Node Directory	Federation Coordinator	Continuous	Maintain registry of participating nodes, their data domains, and availability SLAs
Federation Audit Log	Coordinator + Each Node	Per query	Immutable record of inter-node queries for compliance and dispute resolution
Embedding Model Version Agreement	All Nodes	Per model upgrade	Ensure all nodes upgrade embedding model atomically to maintain score comparability

10. Operational Considerations

Monitoring

Metric	Alert Threshold	Notes
Node availability (per node)	< 99% over 1 hour	Alert node operator; degrade response noting unavailable node
Cross-node query latency P95	> 3 seconds	Check slowest node; consider reducing timeout
Node declination rate	> 20% of queries to a given node	Investigate ACL configuration; may indicate access policy change
Score normalisation quality (RRF rank correlation)	Significant correlation drop	Indicates embedding model version mismatch between nodes

Service Level Objectives

SLO	Target	Notes
Federated query response P95	≤ 4 seconds	Longer than centralised RAG due to network fan-out
Node availability (per node)	≥ 99.5%	Per federation agreement
Federation coordinator availability	≥ 99.9%	Coordinator is the single point of failure; deploy multi-AZ

11. Cost Considerations

Cost Drivers

Cost Driver	Notes
Per-node infrastructure	Each node is a full RAG stack; N nodes = N × single-node infrastructure cost
Cross-node network egress	Network egress costs for result transmission between nodes and coordinator
Federation coordinator compute	Fan-out, score normalisation, and re-ranking are compute-intensive at scale
Embedding model agreement enforcement	Coordinating atomic embedding model upgrades across nodes requires change management overhead

Indicative Cost Range

Federation Scale	Monthly Cost Range
3–5 nodes, small corpora	$3,000 – $10,000 per node + $2,000–$5,000 coordinator
5–10 nodes, medium corpora	$5,000 – $20,000 per node + $5,000–$15,000 coordinator
10+ nodes, large corpora	Custom pricing; infrastructure-as-code essential

12. Trade-Off Analysis

Centralised vs. Federated RAG

Dimension	Centralised RAG (RAG001)	Federated RAG (RAG004)
Retrieval Quality	Highest (unified index, no score normalisation noise)	Lower (score normalisation introduces noise)
Data Sovereignty	Low (data must be centralised)	High (data never leaves node)
Latency	Lowest	Higher (network fan-out adds 50–200ms)
Operational Complexity	Medium	High (N independent stacks + coordinator)
Cost	Lower at scale	Higher (N × node cost)
Recommended For	Single-org, no sovereignty constraints	Multi-org, data residency requirements

Architectural Tensions

Tension	Trade-off	Recommendation
Node response completeness vs. privacy	Full chunk text in response: best quality; metadata-only: private	Data sharing agreement governs; default to full text for trusted nodes, metadata-only for untrusted
Query timeout per node vs. completeness	Short timeout: fast but may miss slow nodes; long timeout: complete but slow	Async fan-out with 2s timeout; include results from slow nodes in "extended" mode if user requests

13. Failure Modes

Failure Mode	Likelihood	Impact	Detection	Recovery
Embedding model version drift between nodes	Medium	High (score normalisation fails)	Score distribution anomaly detection	Enforce embedding model version in node registration; alert on version mismatch
Coordinator as single point of failure	Low	Critical	Health check monitoring	Multi-AZ coordinator deployment; circuit breaker per node
Node A reveals existence of document to Node B via query pattern	Low	Medium	Privacy audit of query logs at each node	Query vector obfuscation; prohibit node-side query logging beyond request metadata
Federation data sharing agreement expires	Low	High	Automated agreement expiry monitoring	Alert 30/60/90 days before expiry; suspend node access on expiry

14. Regulatory Considerations

Regulation	Requirement	Federated RAG Response
GDPR Chapter V	Personal data cannot be transferred outside EEA without adequate protections	Data never leaves the node — federated retrieval transmits result sets, not raw data; result sets may still contain personal data if not redacted
Privacy Act 1988 APP 8	Cross-border disclosure of personal information	Node-level PII redaction (EAAPL-RAG003) applied before result sets leave the node's jurisdiction
Australian Data Sovereignty	Government data must remain in Australia	Node hosted on Australian infrastructure; coordinator queries never cause data to cross border
EU AI Act Article 10	Data governance for AI training and operation	Data governance remains with each node; federation protocol does not create new data aggregation

15. Reference Implementations

AWS (Multi-Region Federation)

Local nodes: OpenSearch k-NN in each AWS region; Lambda-based retrieval API
Coordinator: ECS Fargate service in primary region; Step Functions for fan-out
Inter-node security: AWS PrivateLink or VPC peering + mTLS; IAM cross-account roles
Score normalisation: Lambda function with RRF implementation

Azure (Cross-Tenant Federation)

Local nodes: Azure AI Search per tenant; Azure Functions as retrieval API
Coordinator: Azure Container Apps; Logic Apps for fan-out orchestration
Inter-node security: Azure AD B2B + managed identity; Private Endpoints
Score normalisation: Azure Function with RRF

On-Premises + Cloud Hybrid

Local nodes: Weaviate or Qdrant on-premises + cloud nodes
Coordinator: Kubernetes deployment with custom federation service
Inter-node security: WireGuard VPN between on-premises and cloud; mTLS with SPIFFE/SPIRE

Pattern ID	Pattern Name	Relationship
EAAPL-RAG001	Enterprise RAG	Foundation for each local node; RAG004 federates multiple RAG001 instances
EAAPL-RAG003	Secure RAG	Applied at each local node before transmitting results; prevents PII leakage through federation
EAAPL-RAG005	Hybrid RAG	Can be applied within each local node for better per-node retrieval quality
EAAPL-KNW004	Vector Database Management	Governs each local node's vector database independently

17. Maturity Assessment

Overall Maturity: Emerging — The core retrieval federation mechanics are proven; score normalisation at scale and privacy-preserving query obfuscation are active research areas; production deployments are limited to well-resourced organisations.

Dimension	Score (1–5)	Rationale
Technology Readiness	3	Core components available; federation coordination and score normalisation lack turnkey tooling
Tooling Ecosystem	2	No mature federation framework; custom orchestration required
Operational Guidance	2	Limited production guidance; each deployment is largely custom
Security & Compliance	3	mTLS and local ACL enforcement are well-understood; query privacy obfuscation is experimental
Scalability Evidence	2	Small-scale federations (3–5 nodes) proven; large federations (20+ nodes) mostly theoretical
Cost Predictability	2	N × single-node cost model; coordination overhead highly variable

18. Revision History

Version	Date	Author	Changes
1.0	2024-06-01	EAAPL Working Group	Initial publication
1.1	2025-01-15	EAAPL Working Group	Score normalisation section expanded; RRF formalised; privacy-preserving query obfuscation added

Track this pattern for APRA/ASIC review

← Back to Library More Retrieval-Augmented Generation →