[EAAPL-RAG004] Federated Retrieval-Augmented Generation
Category: Artificial Intelligence / Retrieval-Augmented Generation
Sub-category: Distributed and Federated Retrieval
Version: 1.1
Maturity: Emerging
Tags: rag federated distributed data-sovereignty privacy-preserving cross-organisation score-normalisation
Regulatory Relevance: GDPR Chapter V (Cross-border transfers), Privacy Act 1988 Part IIIA, EU AI Act Article 10, APRA CPS234 §55 (third-party data), Australian Data Sovereignty requirements
1. Executive Summary
Federated RAG enables retrieval across multiple organisationally or geographically distributed knowledge bases without requiring data to be centralised in a single repository. Each participating node maintains its own vector index and document store, enforces its own access controls, and responds to retrieval requests without exposing raw documents to the federation coordinator. A central orchestrator fans out queries to participating nodes, collects ranked result sets, normalises scores across heterogeneous indexes, and assembles the final context for the language model.
For CIOs navigating cross-border data regulations, multi-entity corporate structures, or public-private data sharing partnerships, Federated RAG provides a principled architecture for enabling AI-powered knowledge synthesis across organisational boundaries without violating data residency, sovereignty, or confidentiality obligations. The pattern is relevant to government agency networks (where each agency maintains its own data under separate legislative mandates), healthcare provider networks (where patient data cannot leave jurisdictions), joint ventures (where each party contributes knowledge without disclosing the full corpus to the other), and multi-cloud enterprise architectures (where data must remain within specific cloud regions due to contractual or regulatory constraints).
2. Problem Statement
Business Problem
Organisations increasingly need to answer questions that span multiple independent knowledge bases, but the data owners of those bases have legitimate legal, commercial, or regulatory reasons not to share raw content. A government department asked to answer a cross-agency compliance question should not need to copy another agency's classified data into its own systems. A hospital network should not centralise patient records from all member hospitals into a single repository to enable AI search. A joint venture should not require each partner to surrender commercial-in-confidence documents to a shared index.
Technical Problem
Standard centralised RAG requires all documents to be ingested into a single vector index. This is architecturally incompatible with data sovereignty requirements, cross-border transfer restrictions, organisational trust boundaries, and contractual non-disclosure obligations. Naive distributed solutions (running independent RAG systems and aggregating answers) compound the problem: scores from different indexes are not comparable, different indexes may use different embedding models, and there is no principled mechanism for cross-node relevance ranking.
Symptoms
- Stalled AI initiatives because legal and compliance teams cannot approve centralised data aggregation
- Separate, siloed AI assistants per business unit with no ability to answer cross-unit questions
- Manual "copy and paste from multiple systems" workflows for staff who need cross-domain answers
- Joint venture or consortium participants rejecting shared data infrastructure proposals
Cost of Inaction
- Inability to leverage AI for cross-organisational knowledge synthesis, leaving manual research workflows in place
- Missed regulatory intelligence: compliance teams unable to get cross-jurisdiction answers from distributed regulatory corpora
- Competitive disadvantage versus organisations that have successfully federated knowledge infrastructure
3. Context
When to Apply
- Cross-organisation AI knowledge sharing where raw data cannot be centralised (government agencies, healthcare networks, industry consortia)
- Multi-jurisdiction deployments where data must remain within specific geographic boundaries
- Joint ventures and public-private partnerships where each party retains data sovereignty
- Large enterprises with independent subsidiaries or business units that have separate data governance regimes
- Regulatory constraint: the organisation is subject to data localisation requirements (EU GDPR Chapter V, Australian data sovereignty) that prohibit centralised indexing
When NOT to Apply
- Data can be legally and contractually centralised (use EAAPL-RAG001 for significantly better retrieval quality)
- Single-tenant deployment with no cross-boundary requirements
- Latency requirements preclude network round-trips to remote nodes (federated retrieval adds 50–200ms per node)
- All nodes use incompatible embedding models and re-embedding is not feasible (score normalisation quality degrades severely)
Prerequisites
- Each participating node must expose a standardised retrieval API (REST or gRPC) that accepts a query vector and metadata filters and returns ranked result sets with scores and metadata
- Score normalisation requires nodes to use compatible (ideally identical) embedding models, or to implement score calibration
- A federation coordination layer (either a dedicated service or the primary node acting as coordinator)
- Inter-node network connectivity with appropriate security (mTLS, API keys, VPN tunnels between trusted nodes)
- Data sharing agreement or federation protocol specifying what metadata each node may expose
Industry Applicability
| Industry |
Federation Participants |
Data Sovereignty Driver |
Example Use Case |
| Government |
Federal + State agencies |
Legislative jurisdiction separation |
Whole-of-government regulatory knowledge assistant |
| Healthcare |
Hospital networks + GP clinics |
Patient data cannot leave jurisdiction |
Clinical decision support across care network |
| Financial Services |
Group entities + subsidiaries |
Separate legal entities; intra-group data transfer rules |
Group-wide risk and compliance knowledge base |
| Education |
Universities in national consortium |
FERPA/Privacy Act per institution |
National research knowledge assistant |
| Defence & Intelligence |
Allied nation agencies |
National security classification regimes |
Coalition knowledge sharing (lowest-classification tier) |
4. Architecture Overview
Federated RAG decomposes into three functional layers: local nodes (which own their data and serve retrieval requests), the federation coordinator (which orchestrates cross-node retrieval and normalises results), and the generation layer (which assembles the federated context and invokes the LLM).
Local Node Architecture
Each local node is a complete, independently operated RAG stack: document ingestion pipeline, chunking engine, embedding model, vector database, and a retrieval API endpoint. The node enforces its own access controls — the coordinator has no ability to bypass the node's ACL enforcement. The node exposes a retrieval endpoint that:
- Accepts a query vector (already embedded by the coordinator) and optional metadata filters
- Enforces local ACL for the requesting entity (the coordinator's identity, not the end user's)
- Returns a ranked result set:
[(chunk_text, metadata, score, chunk_id)] — crucially, never raw document bytes
- Applies rate limiting per coordinator identity to prevent denial-of-service
The node's governance principle is: the node owner controls what the node reveals. The node may return only metadata and scores (no chunk text) for sensitive documents, requiring the coordinator to resolve context from an approved escrow, or to simply note "relevant classified content exists but cannot be included in context."
Federation Coordinator
The coordinator is the query orchestration layer. Upon receiving a user query, the coordinator:
- Embeds the query using the shared embedding model
- Determines which nodes to query based on the query's topic scope and the user's inter-node authorisation
- Fans out the query vector in parallel to all relevant nodes (with a configurable timeout per node)
- Collects ranked result sets from responding nodes
- Applies score normalisation to make scores from different nodes comparable
- Re-ranks the normalised result set to produce a unified top-K list
- Assembles context from the returned chunk texts (subject to each node's disclosure level)
- Records which nodes were queried, which responded, and which declined (for audit and transparency)
Score Normalisation
Score normalisation is the most technically complex component of Federated RAG. When nodes use the same embedding model, cosine similarity scores are on the same scale and can be compared directly. When models differ (which should be avoided but may be unavoidable), score normalisation is required: each node's score distribution is modelled (mean and standard deviation) using calibration queries, and scores are Z-normalised before cross-node ranking. Reciprocal Rank Fusion (RRF) provides an alternative that is model-agnostic and rank-based, requiring only a ranked list from each node rather than raw similarity scores.
Privacy-Preserving Retrieval Modes
For maximum privacy preservation, the coordinator can operate in a query vector obfuscation mode using techniques from privacy-preserving machine learning: the query vector is perturbed with calibrated noise before being sent to nodes (analogous to differential privacy), preventing nodes from reconstructing the original query. This mode trades some retrieval quality for stronger privacy guarantees — the node cannot infer the full semantic content of the query. This mode is recommended for inter-organisation federation where the coordinator does not fully trust all nodes.
5. Architecture Diagram
flowchart TD
subgraph Nodes["Federated Nodes"]
A[Node A API]
B[Node B API]
C[Node C API]
end
subgraph Coordinator["Federation Coordinator"]
D[User Query]
E[Node Router]
F[Score Normaliser]
G[LLM Generation]
end
subgraph Audit["Governance"]
H[Federation Audit Log]
end
D --> E
E -->|fan-out mTLS| A
E -->|fan-out mTLS| B
E -->|fan-out mTLS| C
A -->|ranked results| F
B -->|ranked results| F
C -->|declined or results| F
F --> G --> D
E --> H
F --> H
style D fill:#dbeafe,stroke:#3b82f6
style E fill:#f0fdf4,stroke:#22c55e
style A fill:#fef9c3,stroke:#eab308
style B fill:#fef9c3,stroke:#eab308
style C fill:#fef9c3,stroke:#eab308
style F fill:#f0fdf4,stroke:#22c55e
style G fill:#d1fae5,stroke:#10b981
style H fill:#fef9c3,stroke:#eab308
6. Components
| Component |
Type |
Responsibility |
Technology Options |
Criticality |
| Local Node Retrieval API |
Integration |
Serve retrieval requests to the coordinator; enforce local ACL |
FastAPI / gRPC service; cloud API Gateway |
Critical |
| Local Vector Database |
Storage |
Node-local vector index; never exposed directly to coordinator |
Weaviate, Qdrant, pgvector, OpenSearch |
Critical |
| Shared Embedding Model |
ML Inference |
Ensure all nodes embed in the same vector space |
Agreed embedding model deployed at each node; identical version |
Critical |
| Federation Coordinator |
Orchestration |
Fan out queries; collect and normalise results; assemble context |
Custom Python service; LangChain with custom retriever; LlamaIndex federation |
Critical |
| Node Router |
Business Logic |
Select which nodes to query for a given query type |
Custom routing rules + topic classifier |
High |
| Score Normaliser |
Algorithm |
Normalise heterogeneous scores to a comparable scale |
RRF (rank-based, model-agnostic) or Z-normalisation (same model required) |
High |
| Global Re-ranker |
Ranking |
Re-rank normalised cross-node results |
Cross-encoder re-ranker; Cohere Rerank API |
Medium |
| Federation Audit Logger |
Compliance |
Record which nodes were queried, which responded, which declined |
Append-only log store; tamper-evident |
High |
| Inter-Node Security (mTLS) |
Security |
Authenticate coordinator to each node; encrypt retrieval traffic |
mTLS certificates (SPIFFE/SPIRE for zero-trust); API key + TLS fallback |
Critical |
| Query Vector Obfuscator (optional) |
Privacy |
Add calibrated noise to query vector before sending to untrusted nodes |
Custom differential privacy library; RAPPOR-style perturbation |
Medium |
7. Data Flow
Primary Flow
| Step |
Actor |
Action |
Output |
| 1 |
Each local node |
Ingest documents from local sources; build local vector index |
Independent local vector indexes per node |
| 2 |
User |
Submit query to federation coordinator |
Query string + user identity |
| 3 |
Query Embedder |
Embed query using shared embedding model |
Query vector |
| 4 |
Node Router |
Determine relevant nodes based on query topic scope and user's inter-node authorisation |
List of target node endpoints |
| 5 |
Federation Coordinator |
Fan out query vector in parallel to all target nodes with inter-node auth credentials |
Parallel retrieval requests |
| 6 |
Each local node |
Validate coordinator identity; enforce local ACL; execute local ANN search; return ranked results |
[(chunk_text, metadata, score)] per node or DECLINED |
| 7 |
Score Normaliser |
Apply RRF or Z-normalisation to align scores across nodes |
Normalised, unified candidate list with node provenance |
| 8 |
Global Re-ranker |
Re-rank unified candidate list |
Top-N candidates |
| 9 |
Context Assembler |
Assemble prompt with node provenance labels on each chunk |
Assembled prompt |
| 10 |
LLM |
Generate answer with cross-node citations |
Raw response |
| 11 |
Response |
Return answer with attribution to specific nodes |
Answer + [Source: Node A - Doc X] citations |
| 12 |
Federation Audit Logger |
Record complete query execution: nodes queried, response times, scores, declinations |
Immutable audit record |
Error Flow
| Error Condition |
Detection |
Recovery |
| Node timeout (node unavailable or slow) |
Per-node timeout (configurable, default 2s) |
Proceed with results from available nodes; note unavailable node in response metadata |
| Node declines query (ACL rejection for coordinator identity) |
Node returns 403 |
Log declination; note in response: "Some sources unavailable due to access restrictions" |
| Score normalisation failure (node uses different embedding model) |
Score distribution anomaly detection |
Fall back to RRF (rank-based); flag model mismatch for node operator |
| All nodes decline or timeout |
No results assembled |
Return "No accessible content found across federated sources"; do not generate from LLM |
8. Security Considerations
Inter-Node Trust Model
The federation coordinator authenticates to each node using mTLS mutual authentication. Each node maintains a whitelist of coordinator certificate fingerprints it trusts. This prevents unauthorised coordinators from querying nodes. Nodes do not trust the coordinator's claim about end-user identity — nodes enforce only the coordinator's identity against the inter-node access policy.
Data Minimisation in Federation Protocol
The retrieval API response should include the minimum data necessary: chunk text (if the node's data sharing agreement permits), score, and metadata (document title, date, classification). Raw document bytes are never returned in the retrieval protocol. For highly sensitive nodes, chunk text may be replaced with a summary or an opaque reference, with the coordinator noting "relevant content exists but is not disclosable."
OWASP LLM Top 10 Mitigations
| OWASP LLM Risk |
Federated Specific Concern |
Mitigation |
| LLM01: Prompt Injection |
Malicious content injected in one node's documents propagates through federation to the coordinator's LLM |
Node-side content sanitisation; coordinator treats all retrieved content as untrusted data |
| LLM06: Sensitive Information Disclosure |
Coordinator may inadvertently reveal to node B that node A has a specific document (via query vectors) |
Query vector obfuscation for inter-organisation federation; query vector is not logged at nodes |
| LLM09: Overreliance |
User assumes all nodes were queried; a declined or timed-out node is silently excluded |
Explicitly surface node availability status in every response |
9. Governance Considerations
Federation Governance Framework
Each federation must have a formal data sharing agreement that specifies: which nodes participate, what metadata each node may expose, the agreed embedding model version, the score normalisation method, audit log sharing obligations, and the dispute resolution process when a node incorrectly declines a coordinator request.
Governance Artefacts
| Artefact |
Owner |
Frequency |
Purpose |
| Federation Data Sharing Agreement |
Legal / All Participating Organisations |
Per federation; reviewed annually |
Legal basis for cross-node retrieval |
| Node Directory |
Federation Coordinator |
Continuous |
Maintain registry of participating nodes, their data domains, and availability SLAs |
| Federation Audit Log |
Coordinator + Each Node |
Per query |
Immutable record of inter-node queries for compliance and dispute resolution |
| Embedding Model Version Agreement |
All Nodes |
Per model upgrade |
Ensure all nodes upgrade embedding model atomically to maintain score comparability |
10. Operational Considerations
Monitoring
| Metric |
Alert Threshold |
Notes |
| Node availability (per node) |
< 99% over 1 hour |
Alert node operator; degrade response noting unavailable node |
| Cross-node query latency P95 |
> 3 seconds |
Check slowest node; consider reducing timeout |
| Node declination rate |
> 20% of queries to a given node |
Investigate ACL configuration; may indicate access policy change |
| Score normalisation quality (RRF rank correlation) |
Significant correlation drop |
Indicates embedding model version mismatch between nodes |
Service Level Objectives
| SLO |
Target |
Notes |
| Federated query response P95 |
≤ 4 seconds |
Longer than centralised RAG due to network fan-out |
| Node availability (per node) |
≥ 99.5% |
Per federation agreement |
| Federation coordinator availability |
≥ 99.9% |
Coordinator is the single point of failure; deploy multi-AZ |
11. Cost Considerations
Cost Drivers
| Cost Driver |
Notes |
| Per-node infrastructure |
Each node is a full RAG stack; N nodes = N × single-node infrastructure cost |
| Cross-node network egress |
Network egress costs for result transmission between nodes and coordinator |
| Federation coordinator compute |
Fan-out, score normalisation, and re-ranking are compute-intensive at scale |
| Embedding model agreement enforcement |
Coordinating atomic embedding model upgrades across nodes requires change management overhead |
Indicative Cost Range
| Federation Scale |
Monthly Cost Range |
| 3–5 nodes, small corpora |
$3,000 – $10,000 per node + $2,000–$5,000 coordinator |
| 5–10 nodes, medium corpora |
$5,000 – $20,000 per node + $5,000–$15,000 coordinator |
| 10+ nodes, large corpora |
Custom pricing; infrastructure-as-code essential |
12. Trade-Off Analysis
Centralised vs. Federated RAG
| Dimension |
Centralised RAG (RAG001) |
Federated RAG (RAG004) |
| Retrieval Quality |
Highest (unified index, no score normalisation noise) |
Lower (score normalisation introduces noise) |
| Data Sovereignty |
Low (data must be centralised) |
High (data never leaves node) |
| Latency |
Lowest |
Higher (network fan-out adds 50–200ms) |
| Operational Complexity |
Medium |
High (N independent stacks + coordinator) |
| Cost |
Lower at scale |
Higher (N × node cost) |
| Recommended For |
Single-org, no sovereignty constraints |
Multi-org, data residency requirements |
Architectural Tensions
| Tension |
Trade-off |
Recommendation |
| Node response completeness vs. privacy |
Full chunk text in response: best quality; metadata-only: private |
Data sharing agreement governs; default to full text for trusted nodes, metadata-only for untrusted |
| Query timeout per node vs. completeness |
Short timeout: fast but may miss slow nodes; long timeout: complete but slow |
Async fan-out with 2s timeout; include results from slow nodes in "extended" mode if user requests |
13. Failure Modes
| Failure Mode |
Likelihood |
Impact |
Detection |
Recovery |
| Embedding model version drift between nodes |
Medium |
High (score normalisation fails) |
Score distribution anomaly detection |
Enforce embedding model version in node registration; alert on version mismatch |
| Coordinator as single point of failure |
Low |
Critical |
Health check monitoring |
Multi-AZ coordinator deployment; circuit breaker per node |
| Node A reveals existence of document to Node B via query pattern |
Low |
Medium |
Privacy audit of query logs at each node |
Query vector obfuscation; prohibit node-side query logging beyond request metadata |
| Federation data sharing agreement expires |
Low |
High |
Automated agreement expiry monitoring |
Alert 30/60/90 days before expiry; suspend node access on expiry |
14. Regulatory Considerations
| Regulation |
Requirement |
Federated RAG Response |
| GDPR Chapter V |
Personal data cannot be transferred outside EEA without adequate protections |
Data never leaves the node — federated retrieval transmits result sets, not raw data; result sets may still contain personal data if not redacted |
| Privacy Act 1988 APP 8 |
Cross-border disclosure of personal information |
Node-level PII redaction (EAAPL-RAG003) applied before result sets leave the node's jurisdiction |
| Australian Data Sovereignty |
Government data must remain in Australia |
Node hosted on Australian infrastructure; coordinator queries never cause data to cross border |
| EU AI Act Article 10 |
Data governance for AI training and operation |
Data governance remains with each node; federation protocol does not create new data aggregation |
15. Reference Implementations
AWS (Multi-Region Federation)
- Local nodes: OpenSearch k-NN in each AWS region; Lambda-based retrieval API
- Coordinator: ECS Fargate service in primary region; Step Functions for fan-out
- Inter-node security: AWS PrivateLink or VPC peering + mTLS; IAM cross-account roles
- Score normalisation: Lambda function with RRF implementation
Azure (Cross-Tenant Federation)
- Local nodes: Azure AI Search per tenant; Azure Functions as retrieval API
- Coordinator: Azure Container Apps; Logic Apps for fan-out orchestration
- Inter-node security: Azure AD B2B + managed identity; Private Endpoints
- Score normalisation: Azure Function with RRF
On-Premises + Cloud Hybrid
- Local nodes: Weaviate or Qdrant on-premises + cloud nodes
- Coordinator: Kubernetes deployment with custom federation service
- Inter-node security: WireGuard VPN between on-premises and cloud; mTLS with SPIFFE/SPIRE
| Pattern ID |
Pattern Name |
Relationship |
| EAAPL-RAG001 |
Enterprise RAG |
Foundation for each local node; RAG004 federates multiple RAG001 instances |
| EAAPL-RAG003 |
Secure RAG |
Applied at each local node before transmitting results; prevents PII leakage through federation |
| EAAPL-RAG005 |
Hybrid RAG |
Can be applied within each local node for better per-node retrieval quality |
| EAAPL-KNW004 |
Vector Database Management |
Governs each local node's vector database independently |
17. Maturity Assessment
Overall Maturity: Emerging — The core retrieval federation mechanics are proven; score normalisation at scale and privacy-preserving query obfuscation are active research areas; production deployments are limited to well-resourced organisations.
| Dimension |
Score (1–5) |
Rationale |
| Technology Readiness |
3 |
Core components available; federation coordination and score normalisation lack turnkey tooling |
| Tooling Ecosystem |
2 |
No mature federation framework; custom orchestration required |
| Operational Guidance |
2 |
Limited production guidance; each deployment is largely custom |
| Security & Compliance |
3 |
mTLS and local ACL enforcement are well-understood; query privacy obfuscation is experimental |
| Scalability Evidence |
2 |
Small-scale federations (3–5 nodes) proven; large federations (20+ nodes) mostly theoretical |
| Cost Predictability |
2 |
N × single-node cost model; coordination overhead highly variable |
18. Revision History
| Version |
Date |
Author |
Changes |
| 1.0 |
2024-06-01 |
EAAPL Working Group |
Initial publication |
| 1.1 |
2025-01-15 |
EAAPL Working Group |
Score normalisation section expanded; RRF formalised; privacy-preserving query obfuscation added |